[Numpy-discussion] Memory allocation cleanup
jtaylor.debian at googlemail.com
Fri Jan 10 14:15:26 EST 2014
On 10.01.2014 17:03, Nathaniel Smith wrote:
> On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> For this reason and missing calloc I don't think we should use the Python
>> API for data buffers just yet. Any benefits are relatively small anyway.
> It really would be nice if our data allocations would all be visible
> to the tracemalloc library though, somehow. And I doubt we want to
> patch *all* Python allocations to go through posix_memalign, both
> because this is rather intrusive and because it would break python -X
we can most likely plug aligned allocators into the python allocator to
still be able to use tracemalloc but it would be python3.4 only ,
older versions would continue to use our aligned allocators directly
with our own tracing.
I think thats fine, I doubt the tracemalloc module will be backported to
An issue is we can't fit calloc in there without abusing one of the
domains, but I think it is also not so critical to keep it. The
sparseness is neat but you can lose it very quickly again too (basically
on any full copy) and its not portable.
> How certain are we that we want to switch to aligned allocators in the
> future? If we don't, then maybe it makes to ask python-dev for a
> calloc interface; but if we do, then I doubt we can convince them to
> add aligned allocation interfaces, and we'll need to ask for something
> else (maybe a "null" allocator, which just notifies the python memory
> tracking machinery that we allocated something ourselves?).
> It's not obvious to me why aligning data buffers is useful - can you
> elaborate? There's no code simplification, because we always have to
> handle the unaligned case anyway with the standard unaligned
> startup/cleanup loops. And intuitively, given the existence of such
> loops, alignment shouldn't matter much in practice, since the most
> that shifting alignment can do is change the number of elements that
> need to be handled by such loops by (SIMD alignment value / element
> size). For doubles, in a buffer that has 16 byte alignment but not 32
> byte alignment, this means that worst case, we end up doing 4
> unnecessary non-SIMD operations.
Its relevant when you have multiple buffer inputs. If they do not have
the same alignment they can't be all peeled to a correct alignment, some
of the inputs will always have be loaded unaligned.
It might be that in modern x86 hardware unaligned loads might be
cheaper. In Nehalem architectures using unaligned instructions have
almost no penalty if the underlying memory is in fact aligned correctly,
but there is still a penalty if it is not aligned.
I'm not sure how relevant that is in the even newer architectures, the
intel docs still recommend aligning memory though.
More information about the NumPy-Discussion