[Python-Dev] Advice sought on memory allocation latency reduction C1X standard proposal
Niall Douglas
s_sourceforge at nedprod.com
Wed Sep 22 22:12:59 CEST 2010
Dear Python Devs,
I am hoping to gain feedback on an ISO C1X/C++ standard library
proposal I hope to submit. It consists of a rationale
(http://mallocv2.wordpress.com/) which shows how growth in RAM
capacity is exponentially outgrowing the growth in RAM access speed.
The consequences are profound: computer software which has always
been written under the assumption of scarcity of RAM capacity will
need to be retargeted to assume the scarcity of RAM access speed
instead.
The C1X proposal (http://mallocv2.wordpress.com/the-c-proposal-text/)
enables four things of interest to Python: (i) aligned block resizing
(ii) speculative in-place block resizing (iii) batch block allocation
and (iv) the ability to reserve address space, thus avoiding the need
to overallocate array storage.
Aligned block resizing is especially useful to numpy. Where one has
an array of aligned SSE vector quantities one cannot currently resize
that block and guarantee that alignment will not be destroyed. With
the new feature of non-relocating realloc() and being able to specify
an alignment to realloc() one may avoid memory copying, and therefore
reduce memory bandwidth utilisation and therefore overall memory
access latencies.
The ability to reserve address space and speculative in-place block
resizing can be combined to allow Python to reserve an arbitrary
amount of address space after the storage for an array object. Should
the array then become extended, the speculative in-place block
resizing can attempt to expand storage into that reserved space
without having to relocate the contents of the storage. This again
translates into much reduced memory copying as well as memory
consumption, and once again reduces overall memory access latencies.
Lastly, the batch allocation mechanism allows a sequence of
allocations to be performed at once. I don't know of any attempts to
have Python make use of similar functionality in Linux's system
allocator, however Perl saw a 18% reduction in startup time
(http://groups.google.com/group/perl-compiler/msg/31bca5297764002b).
I am not familiar with Python's implementation outside working
extensively with Boost.Python, so I was hoping that this list could
advise me on what I might be forgetting, what problems there could be
for Python with this design and/or any other general concerns and
thoughts. I thank the list in advance for your time and
consideration.
Niall Douglas
--
Technology & Consulting Services - ned Productions Limited.
http://www.nedproductions.biz/. VAT reg: IE 9708311Q. Company no:
472909.
More information about the Python-Dev
mailing list