[Numpy-discussion] Question about unaligned access

Julian Taylor jtaylor.debian at googlemail.com
Mon Jul 6 14:32:34 EDT 2015


sorry for the 3 empty mails, my client bugged out...

as a workaround you can align structured dtypes to avoid this issue:

sa = np.fromiter(((i,i) for i in range(1000*1000)),
dtype=np.dtype([('f0', np.int64), ('f1', np.int32)], align=True))


On 06.07.2015 18:21, Francesc Alted wrote:
> 2015-07-06 18:04 GMT+02:00 Jaime Fernández del Río <jaime.frio at gmail.com
> <mailto:jaime.frio at gmail.com>>:
> 
>     On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted <faltet at gmail.com
>     <mailto:faltet at gmail.com>> wrote:
> 
>         Hi,
> 
>         I have stumbled into this:
> 
>         In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
>         dtype=[('f0', np.int64), ('f1', np.int32)])
> 
>         In [63]: %timeit sa['f0'].sum()
>         100 loops, best of 3: 4.52 ms per loop
> 
>         In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
>         dtype=[('f0', np.int64), ('f1', np.int64)])
> 
>         In [65]: %timeit sa['f0'].sum()
>         1000 loops, best of 3: 896 µs per loop
> 
>         The first structured array is made of 12-byte records, while the
>         second is made by 16-byte records, but the latter performs 5x
>         faster.  Also, using an structured array that is made of 8-byte
>         records is the fastest (expected):
> 
>         In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)),
>         dtype=[('f0', np.int64)])
> 
>         In [67]: %timeit sa['f0'].sum()
>         1000 loops, best of 3: 567 µs per loop
> 
>         Now, my laptop has a Ivy Bridge processor (i5-3380M) that should
>         perform quite well on unaligned data:
> 
>         http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
> 
>         So, if 4 years-old Intel architectures do not have a penalty for
>         unaligned access, why I am seeing that in NumPy?  That strikes
>         like a quite strange thing to me.
> 
> 
>     I believe that the way numpy is setup, it never does unaligned
>     access, regardless of the platform, in case it gets run on one that
>     would go up in flames if you tried to. So my guess would be that you
>     are seeing chunked copies into a buffer, as opposed to bulk copying
>     or no copying at all, and that would explain your timing
>     differences. But Julian or Sebastian can probably give you a more
>     informed answer.
> 
> 
> Yes, my guess is that you are right.  I suppose that it is possible to
> improve the numpy codebase to accelerate this particular access pattern
> on Intel platforms, but provided that structured arrays are not that
> used (pandas is probably leading this use case by far, and as far as I
> know, they are not using structured arrays internally in DataFrames),
> then maybe it is not worth to worry about this too much.
> 
> Thanks anyway,
> Francesc
>  
> 




More information about the NumPy-Discussion mailing list