[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)

M.-A. Lemburg mal@lemburg.com
Fri, 26 Apr 2002 10:26:22 +0200


Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > I don't know why it is, but Unicode always seems to unnecessarily
> > heat up any discussion involving it.
> 
> Huh -- I thought I was the only one who noticed this <wink>.

Naa, it's occurred to me several times in the past. Unicode
seems to trigger some memory corruption in Brain 2.2 which
results in spilling out huge amounts of adrenalin and causes
the blood pressure to reach record highs ;-)
 
> > I would really like to know what is causing this: is it a religious
> > issue, does it have to do with the people involved or is Unicode
> > inherently controversial ?
> 
> Unicode had nothing to do with my yelling in this thread.  I've got very low
> tolerance for memory corruption, regardless of source.  When it happens once
> I'm on high alert, when it happens twice in the same place I go postal.  Had
> this been in dictobject.c or boolobject.c, I would have been just as
> unhappy.
> 
> Now that the memory corruption is thought to be solved, and verified in the
> debug build regardless, *now* I'll get cranky about foreigners and their
> lameass character sets <wink>.

Good to know.
 
> On the technical issues remaining, I don't know how to judge the tradeoff
> between memory use and speed here.  If you do, and pymalloc can help in some
> way, I'll be happy to help.

First of all, UTF-8 is probably the most common Unicode
encoding used today and will certainly become *the*
standard encoding within the next few years. So speed matters
a lot in this particular corner of the Unicode 
implementation.

The standard reasoning behind using overallocation for memory
management is that typical modern malloc()s don't really allocate
the memory until it is used (you know this anyway...),
so overallocation doesn't actually cause bundles of memory
chips to heat up. This makes overallocation ideal for the
case where you don't know the exact size in advance but where
you can estimate a reasonable upper bound.

Now with pymalloc the situation is a bit different for
smaller sized memory areas (larger chunks are handed off
to the system malloc() which uses the above strategy).

As Martin's benchmark showed, the counting strategy is
faster for small chunks and this is probably due to the
fact that pymalloc manages these.

Since pymalloc cannot know that an algorithm wants to
use overallocation as memory allocation strategy, it
would probably help to find a way to tell pymalloc
about this fact. It could then either redirect the
request to the system malloc() or use a different
malloc strategy for these chunks.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/