[Python-Dev] binary trees. Review obmalloc.c

Thu May 4 18:29:58 CEST 2006

Tim Peters wrote:
> [Vladimir 'Yu' Stepanov]
>>> * To adapt allocation of blocks of memory with other alignment. Now
>>> alignment is rigidly set on 8 bytes. As a variant, it is possible to
>>> use alignment on 4 bytes. And this value can be set at start of the
>>> interpreter through arguments/variable environments/etc. At testing
>>> with alignment on 4 or 8 bytes difference in speed of work was not
>>> appreciable.
>
> [Martin v. Löwis]
>> That depends on the hardware you use, of course. Some architectures
>> absolutely cannot stand mis-aligned accesses, and programs will just
>> crash if they try to perform such accesses.
>
> Note that we _had_ to add a goofy "long double" to the PyGC_Head union
> because some Python platform required 8-byte alignment for some types
> ... see rev 25454.  Any spelling of malloc() also needs to return
> memory aligned for any legitimate purpose, and 8-byte alignment is the
> strictest requirement we know of across current Python platforms.
It is possible to receive the reference about these tests? I have lead
some tests, which very small difference of speed of work at alignment on
4 and 8 byte (a maximum of 8%, and in the basic less than one percent).

In tests consecutive search of elements in one piece of memory was used.
I understand, that it is necessary to lead still the test with a casual
choice of addresses to minimize optimization of work cache the processor.
And certainly not all platforms will show the same result (I shall try
to not forget to attach a source code of the test and its result of
work :) ).

On the other hand reduction of a memory size by each separate copy of
object is capable to increase speed by the same percent that is lost at
change of displacement about 8 bytes up to 4 :) Besides begins to
refuse probably superstructures on allocation of memory at some objects,
since int.

>> So Python should err on the safe side, and only use a smaller alignment
>> when it is known not to hurt.
>>
>> OTOH, I don't see the *advantage* in reducing the alignment.
>
> It could cut wasted bytes.  There is no per-object memory overhead in
> a release-build obmalloc:  the allocatable part of a pool is entirely
> used for user-visible object memory, except when the alignment
> requirement ends up forcing unused (by both the user and by obmalloc)
> pad bytes.  For example, Python ints consume 12 bytes on 32-bit boxes,
> but if space were allocated for them by obmalloc (it's not), obmalloc
> would have to use 16 bytes per int to preserve 8-byte alignment.
Good note.

> OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment,
> because that's what worked best for Vladimir Marangozov during his
> extensive timing tests.  It's not an isolated decision -- e.g., it's
> also influenced by, and influences, "the best" pool size, and (of
> course) cutting alignment to 4 would double the number of "size
> classes" obmalloc has to juggle.
Yes, the maximum quantity of buffers will really be doubled. But it
should not to affect as that productivity of system, or on total amount
of consumed memory. Change of a fragmentation of memory to count it is
impossible, since it will depend on the concrete application.

>>> * To expand the maximal size which can be allocated by means of the
>>> given module. Now the maximal size is 256 bytes.
>
>> Right. This is somewhat deliberate, though; the expectation is that
>> fragmentation will increase dramatically if even larger size classes
>> are supported.
>
> It's entirely deliberate ;-)  obmalloc is no way trying to be a
> general-purpose allocator.  It was designed specifically for the
> common memory patterns in Python, aiming at large numbers of small
> objects of the same size.  That's extremely common in Python, and
> allocations larger than 256 bytes are not. The maximum size was
> actually 64 bytes at first.    After that, dicts grew an embedded
> 8-element table, which vastly increased the size of the dict struct. 
> obmalloc's limit was boosted to 256 then, although it could have
> stopped at the size of a dict (in the rough ballpark of 150 bytes). 
> There was no reason (then or now) to go beyond 256.
See below.
>>> * At the size of the allocated memory close to maximal, the
>>> allocation of blocks becomes inefficient. For example, for the
>>> sizesof blocks 248 and 256 (blocksize), values of quantity of
>>> elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be
>>> possible to use for the sizes of the block 248 same page, as for the
>>> size of the block 256.
>
>> Good idea; that shouldn't be too difficult to implement: for sizes above
>> 215, pools need to be spaced apart only at 16 bytes.
>
> I'd rather drop the limit to 215 then <0.3 wink>.  Allocations that
> large probably still aren't important to obmalloc, but it is important
> that finding a requested allocation's "size index" be as cheap as
> possible.  Uniformity helps that.
An available module on allocation of memory really does not approach for
overall aims. And speed of work can "blink" in case of existing non-uniform
realization on allocation of memory. In some cases allocation of memory in
the module `obmalloc.c' work as function of a wrapper for standard
malloc/free, that does not raise speed :) .

For an example we shall assume, that two pieces of memory are allocated. 
One
is allocated `obmalloc.c'. Another is allocated standard malloc.
Both of a piece are small enough to be realized by similar methods. In a
number of OS malloc will lose in speed of work for multi-thread programs,
because of blocking the global lock on allocation of memory. Certainly it
concerns not to all operational systems.
> On modern boxes, it may be desirable to boost POOL_SIZE -- nobody has
> run timing experiments as comprehensive as Vladimir ran way back when,
> and there's no particular reason to believe that the same set of
> tradeoffs would look best on current architectures.
As to speed of work it will be not less than in available systems of
allocation of memory. Naturally, there are no pluses without what that of
minuses. In my opinion available minuses can be made insignificant.
Language Python has obviously expressed specificity on allocation of memory
that it is possible to use favourably, having developed system of 
allocation
of memory is direct under it.

The properties inherent in language Python:
* At work in multi-thread a mode already there is blocking GIL which can be
  used and in case of with allocation of memory.
* Objects which are often used, it is better to adapt on the real size,
  instead of on log2(size(ob)) therefore as function realloc is carried out
  extremely seldom or even less often :) .
* Extremely seldom there are objects the size of the allocated memory
  multiple 512, 1024, 2048 since before them heading Python of object, and
  the parameters describing properties of object is placed still.

First two items concern and to present `obmalloc.c', but for a narrow 
circle
of allocated blocks of memory.

In some interpreters own system of allocation of memory is used. Adjustment
of a control system of memory not the latest step to acceleration Python.

Thanks everyone who has answered. The theme on former is interesting.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: aligntest.c
Url: http://mail.python.org/pipermail/python-dev/attachments/20060504/8a7a928e/attachment.asc 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: out-ia32.txt
Url: http://mail.python.org/pipermail/python-dev/attachments/20060504/8a7a928e/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: out-amd64.txt
Url: http://mail.python.org/pipermail/python-dev/attachments/20060504/8a7a928e/attachment-0001.txt