[Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

Frédéric Bastien nouiz at nouiz.org
Wed Jan 8 15:44:41 EST 2014


On Wed, Jan 8, 2014 at 3:40 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Jan 8, 2014 at 12:13 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>> On 18.07.2013 15:36, Nathaniel Smith wrote:
>>> On Wed, Jul 17, 2013 at 5:57 PM, Frédéric Bastien <nouiz at nouiz.org> wrote:
>>>> On the usefulness of doing only 1 memory allocation, on our old gpu ndarray,
>>>> we where doing 2 alloc on the GPU, one for metadata and one for data. I
>>>> removed this, as this was a bottleneck. allocation on the CPU are faster the
>>>> on the GPU, but this is still something that is slow except if you reuse
>>>> memory. Do PyMem_Malloc, reuse previous small allocation?
>>>
>>> Yes, at least in theory PyMem_Malloc is highly-optimized for small
>>> buffer re-use. (For requests >256 bytes it just calls malloc().) And
>>> it's possible to define type-specific freelists; not sure if there's
>>> any value in doing that for PyArrayObjects. See Objects/obmalloc.c in
>>> the Python source tree.
>>
>> PyMem_Malloc is just a wrapper around malloc, so its only as optimized
>> as the c library is (glibc is not good for small allocations).
>> PyObject_Malloc uses a small object allocator for requests smaller 512
>> bytes (256 in python2).
>
> Right, I meant PyObject_Malloc of course.
>
>> I filed a pull request [0] replacing a few functions which I think are
>> safe to convert to this API. The nditer allocation which is completely
>> encapsulated and the construction of the scalar and array python objects
>> which are deleted via the tp_free slot (we really should not support
>> third party libraries using PyMem_Free on python objects without checks).
>>
>> This already gives up to 15% improvements for scalar operations compared
>> to glibc 2.17 malloc.
>> Do I understand the discussions here right that we could replace
>> PyDimMem_NEW  which is used for strides in PyArray with the small object
>> allocation too?
>> It would still allow swapping the stride buffer, but every application
>> must then delete it with PyDimMem_FREE which should be a reasonable
>> requirement.
>
> That sounds reasonable to me.
>
> If we wanted to get even more elaborate, we could by default stick the
> shape/strides into the same allocation as the PyArrayObject, and then
> defer allocating a separate buffer until someone actually calls
> PyArray_Resize. (With a new flag, similar to OWNDATA, that tells us
> whether we need to free the shape/stride buffer when deallocating the
> array.) It's got to be a vanishingly small proportion of arrays where
> PyArray_Resize is actually called, so for most arrays, this would let
> us skip the allocation entirely, and the only cost would be that for
> arrays where PyArray_Resize *is* called to add new dimensions, we'd
> leave the original buffers sitting around until the array was freed,
> wasting a tiny amount of memory. Given that no-one has noticed that
> currently *every* array wastes 50% of this much memory (see upthread),
> I doubt anyone will care...

Seam a good plan. When is it planed to remove the old interface? We
can't do it before I think.

Fred



More information about the NumPy-Discussion mailing list