[IPython-dev] Using Ipython cache as file proxy

Wed Oct 20 13:45:30 EDT 2010

Hello,

Like in many of my code, I do lots of data reading in this
analysis/plotting script (yet to include model computations)
http://code.google.com/p/ccnworks/source/browse/trunk/thesis/part2_saudi/airplot.py

When I first run the IPython, system monitor shows me about 30 MB of
memory usage. After I first run this script memory use jump to around
200 MB. Most of which comes from data reading --2 to 10 MB range 27
different file streams read as masked arrays. Now the issue is the
execution of this script has started taking about 30 secs with this
heavy reading and plotting of 8 pages of figures. I am doing slight
modifications and annotations on my figures to make them look more
readable, and each run airplot.py is taking a while to bring plots the
screen and produce the final multipage pdf file. This is a 4 GB dual
core 2.5 Ghz laptop. I understand that I am mostly bounded by data io
speeds of my not-sure-how-many-spins hdd.

In the mean time, I wonder if I could get some help from IPython to
lower these file read wait periods. Once I execute the script they are
readily available in the IPython shell, nicely responding my whos and
variable access queries. About %99 of my time, I leave my dataset as
is, and making changes on processing/analysis code. As far I know
there is no feature in IPython to look up the local namespace and not
make any duplicate reads of the same name (It sounds a bit fantasy I
know :). Anyways, could this be easily implemented? There might many
exceptions to execute such mechanism but at least for me, let say I
would list the names that I don't want to be re-imported or the type
of objects, instead using the IPython cache [I really mean local
name-space dictionary} to eliminate multiple readings. That would
boosts the execution time of my script very much and possibly instead
of 30 secs it would most likely be done in less than 10 secs.

What do you think?

-- 
Gökhan