Tim Peters wrote:
[Vladimir 'Yu' Stepanov]
* To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable.
[Martin v. Löwis]
That depends on the hardware you use, of course. Some architectures absolutely cannot stand mis-aligned accesses, and programs will just crash if they try to perform such accesses.
Note that we _had_ to add a goofy "long double" to the PyGC_Head union because some Python platform required 8-byte alignment for some types ... see rev 25454. Any spelling of malloc() also needs to return memory aligned for any legitimate purpose, and 8-byte alignment is the strictest requirement we know of across current Python platforms. It is possible to receive the reference about these tests? I have lead some tests, which very small difference of speed of work at alignment on 4 and 8 byte (a maximum of 8%, and in the basic less than one percent).
In tests consecutive search of elements in one piece of memory was used. I understand, that it is necessary to lead still the test with a casual choice of addresses to minimize optimization of work cache the processor. And certainly not all platforms will show the same result (I shall try to not forget to attach a source code of the test and its result of work :) ). On the other hand reduction of a memory size by each separate copy of object is capable to increase speed by the same percent that is lost at change of displacement about 8 bytes up to 4 :) Besides begins to refuse probably superstructures on allocation of memory at some objects, since int.
So Python should err on the safe side, and only use a smaller alignment when it is known not to hurt.
OTOH, I don't see the *advantage* in reducing the alignment.
It could cut wasted bytes. There is no per-object memory overhead in a release-build obmalloc: the allocatable part of a pool is entirely used for user-visible object memory, except when the alignment requirement ends up forcing unused (by both the user and by obmalloc) pad bytes. For example, Python ints consume 12 bytes on 32-bit boxes, but if space were allocated for them by obmalloc (it's not), obmalloc would have to use 16 bytes per int to preserve 8-byte alignment. Good note.
OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment, because that's what worked best for Vladimir Marangozov during his extensive timing tests. It's not an isolated decision -- e.g., it's also influenced by, and influences, "the best" pool size, and (of course) cutting alignment to 4 would double the number of "size classes" obmalloc has to juggle. Yes, the maximum quantity of buffers will really be doubled. But it should not to affect as that productivity of system, or on total amount of consumed memory. Change of a fragmentation of memory to count it is impossible, since it will depend on the concrete application.
* To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes.
Right. This is somewhat deliberate, though; the expectation is that fragmentation will increase dramatically if even larger size classes are supported.
It's entirely deliberate ;-) obmalloc is no way trying to be a general-purpose allocator. It was designed specifically for the common memory patterns in Python, aiming at large numbers of small objects of the same size. That's extremely common in Python, and allocations larger than 256 bytes are not. The maximum size was actually 64 bytes at first. After that, dicts grew an embedded 8-element table, which vastly increased the size of the dict struct. obmalloc's limit was boosted to 256 then, although it could have stopped at the size of a dict (in the rough ballpark of 150 bytes). There was no reason (then or now) to go beyond 256. See below.
* At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizesof blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256.
Good idea; that shouldn't be too difficult to implement: for sizes above 215, pools need to be spaced apart only at 16 bytes.
I'd rather drop the limit to 215 then <0.3 wink>. Allocations that large probably still aren't important to obmalloc, but it is important that finding a requested allocation's "size index" be as cheap as possible. Uniformity helps that. An available module on allocation of memory really does not approach for overall aims. And speed of work can "blink" in case of existing non-uniform realization on allocation of memory. In some cases allocation of memory in the module `obmalloc.c' work as function of a wrapper for standard malloc/free, that does not raise speed :) .
On modern boxes, it may be desirable to boost POOL_SIZE -- nobody has run timing experiments as comprehensive as Vladimir ran way back when, and there's no particular reason to believe that the same set of tradeoffs would look best on current architectures. As to speed of work it will be not less than in available systems of allocation of memory. Naturally, there are no pluses without what that of minuses. In my opinion available minuses can be made insignificant. Language Python has obviously expressed specificity on allocation of memory
For an example we shall assume, that two pieces of memory are allocated.
One
is allocated `obmalloc.c'. Another is allocated standard malloc.
Both of a piece are small enough to be realized by similar methods. In a
number of OS malloc will lose in speed of work for multi-thread programs,
because of blocking the global lock on allocation of memory. Certainly it
concerns not to all operational systems.
that it is possible to use favourably, having developed system of
allocation
of memory is direct under it.
The properties inherent in language Python:
* At work in multi-thread a mode already there is blocking GIL which can be
used and in case of with allocation of memory.
* Objects which are often used, it is better to adapt on the real size,
instead of on log2(size(ob)) therefore as function realloc is carried out
extremely seldom or even less often :) .
* Extremely seldom there are objects the size of the allocated memory
multiple 512, 1024, 2048 since before them heading Python of object, and
the parameters describing properties of object is placed still.
First two items concern and to present `obmalloc.c', but for a narrow
circle
of allocated blocks of memory.
In some interpreters own system of allocation of memory is used. Adjustment
of a control system of memory not the latest step to acceleration Python.
Thanks everyone who has answered. The theme on former is interesting.
#include