How to use a 5 or 6 bit integer in Python?

Bengt Richter bokr at oz.net
Fri Dec 19 01:02:23 EST 2003


On Fri, 19 Dec 2003 14:30:23 +1100, Glen Wheeler <adsl5lcq at tpg.com.au> wrote:

>On Fri, 19 Dec 2003 03:17:36 GMT, "Rainer Deyke" <rainerd at eldwood.com>
>wrote:
>
>>You're trying to solve the wrong problem.  Python caches small integers, so
>>you've only got a very small number of distinct integer objects.  This is a
>>good thing, because each Python object has a eight byte overhead.
>>Introducing a new type wouldn't result in any memory savings.
>>
>
>  Thanks.  I had thought as much, but was not totally sure.
>
>>The real overhead comes from the references to those objects.  Each
>>reference is four bytes (on 32 bit computers), and there is no way to
>>decrease this.  Your best option is to use high level objects that contain
>>many integers internally but don't store them as individual Python
>>references.  The array module is one choice, Numarray is another.  If
>>immutability isn't an issue, you could even use strings (with each character
>>treated as an eight bit integer).
>
>  So the references to each integer is what causes the massive
>overhead.
>  I've read up on Numarray and the array module in the docs, but
>didn't see how I could adapt my current program to use these modules
>because of how my data is currently organised.
>  I have one constantly changing dict with many millions of keys
>(tuples of ints) which in turn associates itself with a tuple of ints
>and two more dicts.
>  Every single container type contains only integers.  Am I correct in
>assuming that an array cannot be used as a key for a dictionary?  As
>they are mutable?
Yes, but you might be able to do better than tuples, depending on what
the ordering means and whether the same number can repeat in a tuple, etc.

If we can tease out a little more about your problem, maybe we can help better ;-)
E.g., how do your tuple-keys come into being and how many times are they re-used?
If there is no nesting, you could create a string key just by ''.join([chr(el) for el in tup])
which wouldn't be as compact as a compressed bit string, but would be smaller than the tuple.
If you were lucky, a change could gain you both time and space.

I'm curious what your dicts and their int tuples represent in the real world ;-)

Regards,
Bengt Richter




More information about the Python-list mailing list