[python-win32] Re: [Zope] Windows Low Fragementation Heap yields speedup of ~15%

Tim Peters tim.peters at gmail.com
Mon Feb 14 20:13:57 CET 2005


[Gfeller Martin]
> I'm running a large Zope application on a 1x1GHz CPU 1GB mem
> Window XP Prof machine using Zope 2.7.3 and Py 2.3.4
> The application typically builds large lists by appending
> and extending them.

That's historically been an especially bad case for Windows systems,
although the behavior varied across specific Windows flavors.  Python
has changed lots of things over time to improve it, including yet
another twist on list-reallocation strategy new in Python 2.4.

> We regularly observed that using a given functionality a
> second time using the same process was much slower (50%)
> than when it ran the first time after startup.

Heh.  On Win98SE, the _first_ time you ran pystone after rebooting the
machine, it ran twice as fast as the second (or third, fourth, ...)
time you tried it.  The only way I ever found to get back the original
speed without a reboot was to run a different process in-between that
allocated almost all physical memory in one giant chunk.  Presumably
that convinced Win98SE to throw away its fragmented heap and start
over again.

> This behavior greatly improved with Python 2.3 (thanks
> to the improved Python object allocator, I presume).

The page you reference later describes a scheme that's (at least
superficially) a lot like pymalloc uses for "small objects".  In
effect, pymalloc takes over buckets 1-32 in the table.

> Nevertheless, I tried to convert the heap used by Python
> to a Windows Low Fragmentation Heap (available on XP
> and 2003 Server). This improved the overall run time
> of a typical CPU-intensive report by about 15%
> (overall run time is in the 5 minutes range), with the
> same memory consumption.
>
> I consider 15% significant enough to let you know about it.

Yes, and thank you.  FYI, Python doesn't call any of the Win32 heap
functions directly; the behavior it sees is inherited from whatever
Microsoft's C implementation uses to support C's
malloc()/realloc()/free().  pymalloc requests 256KB at a time from the
platform malloc, and carves it up itself, so pymalloc isn't affected
by LFH (LFH punts on requests over 16KB, much as pymalloc punts on
requests over 256 bytes).

But "large objects" (including list guts) don't go thru pymalloc to
begin with, so as long as your list guts fit in 16KB, LFH could make a
real difference to how they behave.  Well, actually, it's probably
more the case that LFH gives a boost by keeping small objects _out_ of
the general heap.  Then growing a giant list doesn't bump into
gazillions of small objects.

> For information about the Low Fragmentation Heap, see
> <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/low_fragmentation_heap.asp>
> 
> Best regards,
> Martin
>
> PS: Since I don't speak C, I used ctypes to convert all
>    heaps in the process to LFH (I don't know how to determine
>    which one is the C heap).

It's the one consuming all the time <wink>.


More information about the Python-win32 mailing list