[Tutor] dictionaries and memory handling

Sat Mar 3 14:18:24 CET 2007

> But most of my dictionaries are nested, and since both keys and values 
> in the dbm 'dictionaries' have to be strings, I can't immediately see 
> how I could get it to work.
> 
> A bit more detail: I deal with conditional probabilities, with up to 4 
> parameters. These parameters are numbers or words and determine the 
> value (which is always a number). E.g. I have a dictionary 
> {p1:{p2:{p3:{p4:value}}}}, where the p's are different parameters. I 
> sometimes need to sum over one or more of the parameters – for now I 
> have managed to structure the dictionaries so that I only need to sum 
> over the innermost parameter, although this has been a bit cumbersome.

Depends a bit on how many keys each of the dictionaries is going to have 
and in what order they're filled. You can pickle/cPickle an arbitrary 
amount of data as one value, so the whole {p2:{p3:{p4:value}}} story 
could be a value of the p1 key in the bsddb. However, you may not like 
this if you need to retrieve p1 and add new stuff to it all the time, 
because the pickling and unpickling cycles may not be beneficial to the 
performance.

If you find this is a problem, you could opt to e.g. keep the first 
layer of the dictionary in memory, but map each value to a separate 
bsddb, so that you'd need to do less pickling/unpickling.

Alternatively you could choose to store them as a sort of path in the 
bsddb, like this (obviously wasteful in terms of space):
'p1/p2/p3/p4': '20'
'p1/p2/p3/p5': '45'

Combinations of the approaches above are also possible and I have no 
idea which would be best.

>> - Will the dictionaries take up less memory if I use numbers rather  
>> than words as keys (i.e. will {3:45, 6:77, 9:33} consume less memory  
>> than {"eloquent":45, "helpless":77, "samaritan":33} )? And if so:  
>> Slightly less, or substantially less memory?

Here's a simple test:

 >>> x = [str(i) for i in range(100000, 1000000)]
vs.
 >>> x = [i for i in range(100000, 1000000)]

The interpreter takes about 50 MB for the str() version and about 20 for 
the non-str() version - eyeballed in the Task manager. So it does make a 
significant difference, though not an order-of-magnitude difference. 
That may be enough for now, but if this script stays in use, you're 
bound at some point to need even more.

>> - What are common methods to monitor the memory usage of a script?  
>> Can I add a snippet to the code that prints out how many MBs of  
>> memory a certain dictionary takes up at that particular time?

Not as such. In your case, I think the task manager would be enough. You 
only have this one demanding data structure I assume, so in a rough 
approximation you can pretend that whatever the task manager reports 
(have a look at VM and peak memory usage columns, not just memory usage) 
is caused by the contents of the dictionary.

-- 
Yours,

Andrei

=====
Mail address in header catches spam. Real contact info:
''.join([''.join(s) for s in zip(
"poet at aao.l pmfe!Pes ontuei ulcpss  edtels,s hr' one oC.",
"rjc5wndon.Sa-re laed o s npbi ot.Ira h it oteesn edt C")])