[Tutor] would pickle or cpickle help?

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Fri Jul 1 03:30:46 CEST 2005



> Could I use something like cpickle to store the dictionary once it is
> made so I would not have to make it each time?  I have never tried to
> use pickle so I am bit fuzzy on what it can store and what it can't.
> Also would it really buy me anything...it only takes a second or two to
> make the dictionary?


Hi John,

For a dictionary of about ten-thousand elements, I'm not sure if it will
buy you much.  There's a complexity cost in adding a cache, in the sense
that the cache and the real data might get out of sync if we're not
careful.

(As a concrete example, Python itself caches bytecode-compiled modules,
and in there, there is quite a bit of complexity that can cause grief
every so often due to unexpected things like file permission problems.)


Building dictionaries from scratch might be fast enough, depending on the
application.  For example, given something like /usr/share/dict/words,

######
[dyoo at shoebox dyoo]$ ls -l /usr/share/dict/words
-rw-r--r--  2 root root 2486824 Jul 11  2004 /usr/share/dict/words
[dyoo at shoebox dyoo]$ wc /usr/share/dict/words
 234937  234937 2486824 /usr/share/dict/words
######


building the dictionary that holds all those words on a Pentium 4 2.8Ghz
system takes about:

######
>>> timeit.Timer("import sets; "
                 "s = sets.Set(open('/usr/share/dict/words'))"
                ).timeit(number=5)
1.6103701591491699
######

and two seconds might not be too long if it's a one-time cost per program
run.



> There is a chance the file that I use to make the dictionary will
> eventually grow to be 10,000 lines or more.

I wouldn't optimize this portion of the code yet, until it's known that
it'll really be a significant part of the runtime.

I've seen lots of applications that have made things complex by using
indices and caching, (like BLAST), and in industrial applications, it's
worth it.  But we can't discount that it does make things more complex.
For simple applications, it's probably not worth it.


Best of wishes!



More information about the Tutor mailing list