[Tutor] table to dictionary and then analysis

Tue May 15 11:36:45 CEST 2012

On Mon, 2012-05-14 at 23:38 -0400, bob gailer wrote:
[...]
> I would set up a SQLite database with a table of 4 numeric columns: 
> year, month, rainfall, firearea
> Use SQL to select the desired date range and do the max and avg 
> calculations:
> select year, avg(firearea), max(rainfall) from table where year = 1973 
> and month between 6 and 8)
> 
> you can use dictionaries but that will be harder. Here a start 
> (untested). Assumes data are correct.

Clearly if the data is to be stored for a long time and have various
(currently unknown) queries passed over it then year a database it the
right thing -- though I would probably choose a non-SQL database.

If the issues is to just do quick calculations over the data in the file
format then nothing wrong with using dictionaries or parallel arrays à
la:

        with open ( 'yearmonthrainfire.txt' ) as infile :
            climateindexname = infile.readline ( ).split ( )
            data = [ line.split ( ) for line in infile.readlines ( ) ]

        years = sorted ( { item[0] for item in data } )
        months = [ 'Jan' , 'Feb' , 'Mar' , 'Apr' , 'May' , 'Jun' , 'Jul' , 'Aug' , 'Sep' , 'Oct' , 'Nov' , 'Dec' ]

        dataByYear = { year : [ ( float ( item[2] ) , float ( item[3] ) ) for item in data if item[0] == year ] for year in years } 
        dataByMonth = { month : [ ( float ( item[2] ) , float ( item[3] ) ) for item in data if item[1] == month ] for month in months }

        averagesByYear = { year : ( sum ( dataByYear[year][0] ) / len ( dataByYear[year][0] ) , sum ( dataByYear[year][1] ) / len ( dataByYear[year][1] ) ) for year in years }
        averagesByMonth = { month : ( sum ( dataByMonth[month][0] ) / len ( dataByMonth[month][0] ) , sum ( dataByMonth[month][1] ) / len ( dataByMonth[month][1] ) ) for month in months }

        for year in years :
            print ( year , averagesByYear[year][0] , averagesByYear[year][1] )

        for month in months :
            print ( month , averagesByMonth[month][0] , averagesByMonth[month][1] )

The cost of the repetition in the code here is probably minimal compared
to the disc access costs. On the other hand this is a small data set so
time is probably not a big issue.

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder at ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel at winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/tutor/attachments/20120515/882bea34/attachment.pgp>