[Tutor] dictionaries and memory handling
bill at celestial.net
Fri Feb 23 18:50:37 CET 2007
On Fri, Feb 23, 2007, =?ISO-8859-1?Q? Arild_B._N=E6ss ?= wrote:
>I'm working on a python script for a task in statistical language
>processing. Briefly put it all boils down to counting different
>things in very large text files, doing simple computations on these
>counts and storing the results. I have been using python's dictionary
>type as my basic data structure of storing the counts. This has been
>a nice and simple solution, but turns out to be a bad idea in the
>long run, since the dictionaries become _very_ large, and create
>MemoryErrors when I try to run my script on texts of a certain size.
>It seems that an SQL database would probably be the way to go, but I
>am a bit concerned about speed issues (even though running time is
>not all that crucial here). In any case it would probably take me a
>while to get a database up and running and I need to hand in some
>preliminary results pretty soon, so for now I think I'll postpone the
>SQL and try to tweak my current script to be able to run it on
>slightly longer texts than it can handle now.
You would probably be better off using one of the hash databases,
Berkeley, gdbm, etc. (see the anydbm documentation). These can
be treated exactly like dictionaries in python, and are probably
orders of magnitude faster than using an SQL database.
INTERNET: bill at Celestial.COM Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
``Rightful liberty is unobstructed action according to our will within
limits drawn around us by the equal rights of others. I do not add 'within
the limits of the law' because law is often but the tyrant's will, and
always so when it violates the rights of the individual.''
More information about the Tutor