flattening/rolling up/aggregating a large sorted text file
Steve Holden
steve at holdenweb.com
Wed Mar 21 08:49:39 EDT 2007
adtvff at yahoo.com wrote:
> Hi,
>
> Given a large ascii file (delimited or fixed width) with one ID field
> and dimensions/measures fields, sorted by dimensions, I'd like to
> "flatten" or "rollup" the file by creating new columns: one for each
> combination of dimension level, and summing up measures over all
> records for a given ID.
>
> If the wheel has already been invented, great, please point me in the
> right direction. If not, please share some pointers on how to think
> about this problem in order to write efficient code.
>
> Is a hash with dimension level combinations a good approach, with
> values reset at each new ID level?
>
> I know mysql, Oracle etc will do this , but they all have a cap on #
> of columns allowed. SAS will allow unlimited columns, but I don't own
> SAS.
>
> Thanks.
>
>
> ID,color,shape,msr1
> ------------------------------
> 001, blue, square, 4
> 001, red , circle, 5
> 001, red, circle, 6
>
>
> ID, blue_circle, blue_square, red_circle, red_square
> --------------------------------------------------------------------------
> 001,0,4,11,0
> 002 ...
>
It seems a bit wrong-headed to force this problem to fit a solution
where you define relations with a variable number of columns when the
natural way to solve it would seem to be to sum the msr1 values for each
unique combination of ID, color and shape. That's a pretty
straightforward relational problem.
So, is there some reason the result *has* to have that variable number
of columns?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com
More information about the Python-list
mailing list