[Tutor] data analysis with python

David Martins awesome.me.dm at outlook.com
Thu Nov 15 03:04:48 CET 2012


Thanks again for all the useful tips.
I settled with R for now. As Oscar said, the dataset is not massive so I could have done it using a dictionary. However some of the more frequent requests will include to find data during certain times during certain days, for specific months or weekdays vs. weekends, etc. I believe this would mean that I would have needed some indexing (which made me think of using databases in the first place). All of this seems to be quite easy in R as well
# The weather[3] column stores the string for the weekday
wkdays = which(weather[3] !="Sat"& weather[3] !="Sun")
I guess that would be easy enough with a list comprehension in python too. Binning looks like this:
heatcut = cut(as.matrix(lib[15]),breaks=c(0,max(lib[15])*0.1,max(lib[15])*0.2,max(lib[15])*0.3,max(lib[15])*0.4,max(lib[15])*0.5,max(lib[15])*0.6,max(lib[15])*0.7,max(lib[15])*0.8,max(lib[15])*0.9,max(lib[15])*1.0),labels=c('0%','10%','20%','30%','40%','50%','60%','80%','90%','100%'))
This can be added to a function. So the call will look something like 
bin_me(lib[15], breaks=default, labels=default).
To get one bin in sqlite I wrote this for a sqlite db (not sure if there is an easier way):
select count(Heating_plant_sensible_load) from LibraryMainwhere Heating_plant_sensible_load > (select max(Heating_plant_sensible_load)*0.3 from LibraryMain ANDHeating_plant_sensible_load < (select max(Heating_plant_sensible_load)*0.4 from LibraryMain;
Indexing for certain times, using my approach, would add even more lines; on top of this, I believe you would have to add this either to a view or a new table...
So are seems to be clearer and more concise compared to sql/sqlite (at least for me). On top of that it provides the possibility to do additional analysis later on for specific cases. That it can connect with python is another plus.
Thanks again for everyones ideasdm




> Date: Wed, 14 Nov 2012 13:59:25 +0000
> Subject: Re: [Tutor] data analysis with python
> From: oscar.j.benjamin at gmail.com
> To: awesome.me.dm at outlook.com
> CC: tutor at python.org
> 
> On 14 November 2012 03:17, David Martins <awesome.me.dm at outlook.com> wrote:
> > Hi All
> >
> > I'm trying to use python for analysing data from building energy simulations
> > and was wondering whether there is way to do this without using anything sql
> > like.
> 
> There are many ways to do this.
> 
> >
> > The simulations are typically run for a full year, every hour, i.e. there
> > are 8760 rows and about 100+ variables such as external air temperature,
> > internal air temperature, humidity, heating load, ... making roughly a
> > million data points. I've got the data in a csv file and also managed to
> > write it in a sqlite db.
> 
> This dataset is not so big that you can't just load it all into memory.
> 
> >
> > I would like to make requests like the following:
> >
> > Show the number of hours the aircon is running at 10%, 20%, ..., 100%
> > Show me the average, min, max air temperature, humidity, solar gains,....
> > when the aircon is running at 10%, 20%,...,100%
> >
> > Eventually I'd also like to generate an automated html or pdf report with
> > graphs. Creating graphs is actually somewhat essential.
> 
> Do you mean graphs or plots? I would use matplotlib for plotting. It
> can automatically generate image files of plots. There are also ways
> to generate output for visualising graphs but I guess that's not what
> you mean. Probably I would create a pdf report using latex and
> matplotlib but that's not the only way.
> http://en.wikipedia.org/wiki/Graph_(mathematics)
> http://en.wikipedia.org/wiki/Plot_(graphics)
> 
> > I tried sql  and find it horrible, error prone, too much to write, the logic
> > somehow seems to work different than my brain and I couldn't find
> > particulary good documentation (particulary the documentation of the api is
> > terrible, in my humble opinion). I heard about zope db which might be an
> > alternative. Would you mind pointing me towards an appropriate way to solve
> > my problem? Is there a way for me to avoid having to learn sql or am I
> > doomed?
> 
> There are many ways to avoid learning SQL. I'll suggest the simplest
> one: Can you not just read all the data into memory and then perform
> the computations you want?
> 
> For example:
> 
> $ cat tmp.csv
> Temp,Humidity
> 23,85
> 25,87
> 26,89
> 23,90
> 24,81
> 24,80
> 
> $ cat tmp.py
> #!/usr/bin/env python
> 
> import csv
> 
> with open('tmp.csv', 'rb') as f:
>     reader = csv.DictReader(f)
>     data = []
>     for row in reader:
>         row = dict((k, float(v)) for k, v in row.items())
>         data.append(row)
> 
> maxtemp = max(row['Temp'] for row in data)
> mintemp = min(row['Temp'] for row in data)
> meanhumidity = sum(row['Humidity'] for row in data) / len(data)
> 
> print('max temp is: %d' % maxtemp)
> print('min temp is: %d' % mintemp)
> print('mean humidity is: %f' % meanhumidity)
> 
> $ ./tmp.py
> max temp is: 26
> min temp is: 23
> mean humidity is: 85.333333
> 
> This approach can also be extended to the case where you don't read
> all the data into memory.
> 
> 
> Oscar
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121115/fcac2b25/attachment.html>


More information about the Tutor mailing list