[Python-Dev] PEP 393 memory savings update

"Martin v. Löwis" martin at v.loewis.de
Wed Sep 28 00:56:58 CEST 2011


I have redone my memory benchmark, and added a few new
counters.

The application is a very small Django application. The same
source code of the app and Django itself is used on all Python
versions. The full list of results is at

http://www.dcl.hpi.uni-potsdam.de/home/loewis/djmemprof/

Here are some excerpts:

A. 32-bit builds, storage for Unicode objects
3.x, 32-bit wchar_t: 6378540
3.x, 16-bit wchar_t: 3694694
PEP 393:             2216807

Compared to the previous results, there are now some
significant savings even compared to a narrow unicode build.

B. 3.x, number of strings by maxchar:
ASCII:   35713 (1,300,000 chars)
Latin-1: 235   (11,000 chars)
BMP:     260   (700 chars)
other:   0
total:   36,000 (1,310,000 chars)

This explains why the savings for shortening ASCII objects
are significant in this application. I have no good intuition
how this effect would show for "real" applications. It may be
that the percentage of ASCII strings (in number and chars) grows
proportionally with the total number of strings; it may also
be that the majority of these strings is a certain fixed overhead
(resulting from Python identifiers and other interned strings).

C. String-ish objects in 2.7 and 3.3-trunk:
                   2.x         3.x
#unicode           370      36,000
#bytes          43,000      14,000
#total          43,400      50,000

len(unicode)     5,300   1,306,000
len(bytes)   2,040,000     860,000
len(total)   2,046,000   2,200,000

(Note: the computations in the results are slightly messed up:
the number of bytes for bytes objectts is actually the sum
of the lengths, not the sum of the sizeofs; this gets added
in the "total" lines to the sum of sizeofs of unicode strings,
which is non-sensical. The table above corrects this)

As you can see, Python 3 creates more string objects in total.

D. Memory consumption for 2.x, 3.x, PEP 393, accounting both
   unicode and bytes objects, using 32-bit builds and 32-bit
   wchar_t:
2.x:     3,620,000 bytes
3.x:     7,750,000 bytes
PEP 393: 3,340,000 bytes

This suggests that PEP 393 actually reduces memory consumption
below what 2.7 uses. This is offset though by "other" (non-string)
objects, which take 300KB more in 3.x.

Regards,
Martin


More information about the Python-Dev mailing list