[Numpy-discussion] Question about unaligned access

Todd toddrjen at gmail.com
Tue Jul 7 03:53:50 EDT 2015


On Jul 6, 2015 6:21 PM, "Francesc Alted" <faltet at gmail.com> wrote:
>
> 2015-07-06 18:04 GMT+02:00 Jaime Fernández del Río <jaime.frio at gmail.com>:
>>
>> On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted <faltet at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have stumbled into this:
>>>
>>> In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=[('f0', np.int64), ('f1', np.int32)])
>>>
>>> In [63]: %timeit sa['f0'].sum()
>>> 100 loops, best of 3: 4.52 ms per loop
>>>
>>> In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=[('f0', np.int64), ('f1', np.int64)])
>>>
>>> In [65]: %timeit sa['f0'].sum()
>>> 1000 loops, best of 3: 896 µs per loop
>>>
>>> The first structured array is made of 12-byte records, while the second
is made by 16-byte records, but the latter performs 5x faster.  Also, using
an structured array that is made of 8-byte records is the fastest
(expected):
>>>
>>> In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)),
dtype=[('f0', np.int64)])
>>>
>>> In [67]: %timeit sa['f0'].sum()
>>> 1000 loops, best of 3: 567 µs per loop
>>>
>>> Now, my laptop has a Ivy Bridge processor (i5-3380M) that should
perform quite well on unaligned data:
>>>
>>>
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
>>>
>>> So, if 4 years-old Intel architectures do not have a penalty for
unaligned access, why I am seeing that in NumPy?  That strikes like a quite
strange thing to me.
>>
>>
>> I believe that the way numpy is setup, it never does unaligned access,
regardless of the platform, in case it gets run on one that would go up in
flames if you tried to. So my guess would be that you are seeing chunked
copies into a buffer, as opposed to bulk copying or no copying at all, and
that would explain your timing differences. But Julian or Sebastian can
probably give you a more informed answer.
>
>
> Yes, my guess is that you are right.  I suppose that it is possible to
improve the numpy codebase to accelerate this particular access pattern on
Intel platforms, but provided that structured arrays are not that used
(pandas is probably leading this use case by far, and as far as I know,
they are not using structured arrays internally in DataFrames), then maybe
it is not worth to worry about this too much.
>

That may be more of a chicken-and-egg problem. Structured arrays are pretty
complicated to set up, which means they don't get used much, which means
they don't get much attention, which means they remain complicated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150707/387888c4/attachment.html>


More information about the NumPy-Discussion mailing list