[Tutor] dictionaries and memory handling

Kent Johnson kent37 at tds.net
Sat Feb 24 17:40:56 CET 2007

  Arild B. Næss wrote:
> Hi,
> I'm working on a python script for a task in statistical language  
> processing. Briefly put it all boils down to counting different  
> things in very large text files, doing simple computations on these  
> counts and storing the results. I have been using python's dictionary  
> type as my basic data structure of storing the counts. This has been  
> a nice and simple solution, but turns out to be a bad idea in the  
> long run, since the dictionaries become _very_ large, and create  
> MemoryErrors when I try to run my script on texts of a certain size.
> It seems that an SQL database would probably be the way to go, but I  
> am a bit concerned about speed issues (even though running time is  
> not all that crucial here). In any case it would probably take me a  
> while to get a database up and running and I need to hand in some  
> preliminary results pretty soon, so for now I think I'll postpone the  
> SQL and try to tweak my current script to be able to run it on  
> slightly longer texts than it can handle now.
> So, enough beating around the bush, my questions are:
> - Will the dictionaries take up less memory if I use numbers rather  
> than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory  
> than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so:  
> Slightly less, or substantially less memory?

I'm going to guess here. I think the number will take up 4 bytes plus 
the overhead of an object and the string will take about the number of 
bytes in the string plus the same overhead. But I am guessing and there 
are optimizations in the Python interpreter for both strings and ints 
that may affect this.
> - What are common methods to monitor the memory usage of a script?  
> Can I add a snippet to the code that prints out how many MBs of  
> memory a certain dictionary takes up at that particular time?

See various discussions on comp.lang.python:

> regards,
> Arild Næss
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

More information about the Tutor mailing list