some comments for Python 3000

Bernhard Herzog herzog at online.de
Mon Aug 14 14:29:39 EDT 2000


"Rainer Deyke" <root at rainerdeyke.com> writes:

> Much worse, in fact.  In some cases, C outperforms Python by over 100:1.
> And memory usage is even worse.  In C, I can have a integer variable with
> range 0 to 255 in a single byte.  In Python, I need at least two objects
> (the integer and the string that holds the integer's name), both allocated
> on the stack (with whatever overhead this entails),

I think you meant heap, :-), but for ints the memory overhead is pretty
much 0 because of the suballocation scheme used, unless you switched
that off.

> both with four bytes reference count and four bytes pointer to the
> type object, plus the contents which are again at least four bytes
> each, plus one byte for each character in the variable name

Ok, that's 12 bytes for the int and at least 16 for the string because
in addition to the refcount and type it also has a cached 32bit hash
value and a pointer to the interned version of the string object.
Caching the hash value and the interned string can be switched off, but
let's assume that the defaults are used. 

> - and that isn't counting the extra storage needed for the entry in
> the dictionary (another eight bytes on average at least). That's worse
> than 64:1. Even if the C version uses four bytes for the integer, it's
> 16:1.

Now, where does the 64 come from? Assuming that the string fits into 32
bytes (which means it has at most 16 characters including the trailing
0) and the dict entry, I get 52 bytes. Ok, counting in a bit malloc
overhead and allowing for even longer variable names we get about 64
bytes. This estimation assumes that we're talking about global variables
or instance/class variables; local variables aren't usually stored in a
dict.

However, the strings used for variable names and other identifiers are
interned so if you use the same variable names in several places the
same string objects are used. Plus, for the ints -1 to 99 you always get
the same objects. This kind of objects sharing can reduce the memory
requirements drastically and it makes it very hard to estimate just how
much memory will be needed to hold a certain data structure.

Of course, Python will always need at least 4 bytes to hold a variable's
value.


-- 
Bernhard Herzog   | Sketch, a drawing program for Unix
herzog at online.de  | http://sketch.sourceforge.net/



More information about the Python-list mailing list