[Python-Dev] [Python-checkins] r45925 - in python/trunk: Lib/tempfile.py Lib/test/test_os.py Misc/NEWS Modules/posixmodule.c

Mon May 15 13:43:13 CEST 2006

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> I was leaving those out already - only the codes named 'ERROR_*'
>> get included (see attached parser and generator).
> 
> Right. One might debate whether DNS_INFO_AXFR_COMPLETE (9751L)
> or WSAEACCES (10013L) should be included as well.

The WSA codes are already included in the errno module.
Not sure about the DNS codes, but it would be easy
to add them as well.

> I got a smaller source file as I included only forward mappings,
> and used a loop to create the backwards mappings.
> 
>> Using a lookup object is not really clumsy - you can still access
>> all the values by attribute access. The only difference is that
>> they don't live in the module namespace, but get accessed via
>> an object.
> 
> So how much space would that save?

I'll have to write the lookup code first.

I expect a savings of between 2-3 times since the data will
be stored in C static data. This is only swapped in as
needed and can be shared between processes, so quantifying
the savings is difficult.

>> I'm not worried about the disk space being used. The heap
>> memory usage is what's worrying: the import of the module lets
>> the non-shared memory size of the process grow by 700kB
>> on my AMD64 box.
> 
> That number must be misleading somehow. 

The number does look a bit high - it is possible that the
process has to swap in some of the shared memory. But then
again, the shared memory also increases:

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND

Before import:

lemburg  26515  0.0  0.4 19936 4620 pts/3    S+   13:37   0:00 python

After:

lemburg  26515  0.0  0.5 20780 5320 pts/3    S+   13:37   0:00 python

> There are 1510 strings,
> with a total length of 39972. There are 1510 integers also,
> and they all get added into three dictionaries.

Well, the strings and integers count twice: once in the module
namespace and once in the errorcode dictionary.

> On a 32-bit machine, these should consume 76968 bytes for the
> strings (*), 18120 bytes for the integers, and 100000 bytes
> for the dict entries (**), for a total of 200000 bytes
> at run-time.
> 
> On a 64-bit machine, the strings should consume 101128 bytes (***),
> the integers 24160, and the dict entries 200000 bytes,
> for a total of 325000 bytes.

Given that the code strings and integers are created
twice in my version of the module, the numbers sound about
right.

I agree that creating only one dictionary statically
and the other mapping dynamically will already be a
saving of 50% simply by sharing the string and integer
objects.

>>From that, I would conclude that one should avoid 64-bit machines
> if one is worried about memory usage :-)
> 
> Regards,
> Martin
> 
> (*) assuming 20 bytes string header, 1 byte null-termination,
> and a rounding-up to the next multiple of 8
> (**) assuming 12 bytes per dict entry in three dictionaries
> (winerror.__dict__, winerror.errorcode, interning dict),
> and assuming an average fill ratio of the dicts of 50%
> (***) assuming 40 bytes string header, provided long is
> a 64-bit type on that platform

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::