Data aggregation

vedranp vedran.pregelj at gmail.com
Thu Mar 6 11:28:42 EST 2008


Hi,

I have a case where I should aggregate data from the CSV file, which
contains data in this way:

DATE	TIME	COUNTRY	ZIP	CITY	VALUE1	VALUE2	VALUE3
21.2.2008	00:00	A	1000	CITY1	1	2	3
21.2.2008	00:00	A	1000	CITY2	4	5	6
21.2.2008	00:00	A	1000	CITY3	7	8	9
21.2.2008	00:00	A	1000	CITY4	1	2	3
21.2.2008	00:15	A	1000	CITY1	4	5	6
21.2.2008	00:15	A	1000	CITY2	7	8	9
21.2.2008	00:15	A	1000	CITY3	1	2	3
21.2.2008	00:15	A	1000	CITY4	4	5	6
21.2.2008	00:00	A	2000	CITY10	7	8	9
21.2.2008	00:00	A	2000	CITY20	1	2	3
21.2.2008	00:00	A	2000	CITY30	4	5	6
21.2.2008	00:00	A	2000	CITY40	1	2	3
21.2.2008	00:15	A	2000	CITY10	7	8	9
21.2.2008	00:15	A	2000	CITY20	1	2	3
21.2.2008	00:15	A	2000	CITY30	4	5	6
21.2.2008	00:15	A	2000	CITY40	1	2	3

I need to aggregate data from file1, so the result would be a CSV file
(file2) in this format:

DATE	COUNTRY	ZIP	CITY	SumOfVALUE1	SumOfVALUE2	SumOfVALUE3	formula1
21.2.2008	A	1000	CITY1	5	7	9	12
21.2.2008	A	1000	CITY2	11	13	15	24
21.2.2008	A	1000	CITY3	8	10	12	18
21.2.2008	A	1000	CITY4	5	7	9	12
21.2.2008	A	2000	CITY10	14	16	18	30
21.2.2008	A	2000	CITY20	2	4	6	6
21.2.2008	A	2000	CITY30	8	10	12	18
21.2.2008	A	2000	CITY40	2	4	6	6

So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some
calculation) the values and do some calculation from summed fields
(e.g.: formula1 = SumOfVALUE1+SumOfVALUE2). I am able to do this by
first loading file1 in SQL, perform a query there, which returns the
file2 results and then load it back in the SQL in the different table.

I would like to avoid the step of taking data out from database in
order to process it. I would like to process the file1  in Python and
load the result (file2) in SQL.

>From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?

Regards,
Vedran.



More information about the Python-list mailing list