[Tutor] dictionaries and memory handling
Kent Johnson
kent37 at tds.net
Sat Feb 24 17:40:56 CET 2007
Arild B. Næss wrote:
> Hi,
>
> I'm working on a python script for a task in statistical language
> processing. Briefly put it all boils down to counting different
> things in very large text files, doing simple computations on these
> counts and storing the results. I have been using python's dictionary
> type as my basic data structure of storing the counts. This has been
> a nice and simple solution, but turns out to be a bad idea in the
> long run, since the dictionaries become _very_ large, and create
> MemoryErrors when I try to run my script on texts of a certain size.
>
> It seems that an SQL database would probably be the way to go, but I
> am a bit concerned about speed issues (even though running time is
> not all that crucial here). In any case it would probably take me a
> while to get a database up and running and I need to hand in some
> preliminary results pretty soon, so for now I think I'll postpone the
> SQL and try to tweak my current script to be able to run it on
> slightly longer texts than it can handle now.
>
> So, enough beating around the bush, my questions are:
>
> - Will the dictionaries take up less memory if I use numbers rather
> than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory
> than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so:
> Slightly less, or substantially less memory?
I'm going to guess here. I think the number will take up 4 bytes plus
the overhead of an object and the string will take about the number of
bytes in the string plus the same overhead. But I am guessing and there
are optimizations in the Python interpreter for both strings and ints
that may affect this.
>
> - What are common methods to monitor the memory usage of a script?
> Can I add a snippet to the code that prints out how many MBs of
> memory a certain dictionary takes up at that particular time?
See various discussions on comp.lang.python:
http://tinyurl.com/ysrocc
Kent
>
> regards,
> Arild Næss
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list