I am not super familiar with lists inside a column for Hive, but that might
let you define a table that has a schema of "page-type, page-name,
items-displayed", and then query for a count of individual items (
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF). Possibly use of a
Map type would be best.. not sure.
On Tue, Mar 1, 2011 at 4:33 AM, Cam Bazz
<[email protected]> wrote:
> Now I would like to count impressions per item. To achieve this, I
> made a logger, for instance when the user goes in a category or search
> page, and some items are listed, I am logging:
> CATPAGE CAT1 1,2,3,4,5
> CATPAGE CAT2 6,7,8,9,10
> SEARCH keyword 1,6
> basically I am logging all the displayed items in a comma seperated list.
> I need to calculate and store daily impressions from this such as:
> 1, 2
> 6, 2
> (the first line is item sid, the second number is impressions, in
> total from different impression types)
> Now I have couple of questions:
> considering that the system will produce at least 1 line per item per
> day, what kind of table i must store this? previously, I have been
> using text files for everything, I never had any requirement to query
> hive, but rather export results from it. now I will probably need to
> make queries like "select * from myimpression table where sid = xx"
> giving me a timeline of impressions per item.
> Second question:
> what kind of query I need in order to count impressions like above?
> Thank you very much,