[Numpy-discussion] Setting custom dtypes and 1.14

Thu Jan 25 19:06:18 EST 2018

On 01/25/2018 06:06 PM, Chris Barker wrote:
> Hi all,
> 
> I'm pretty sure this is the same thing as recently discussed on this
> list about 1.14, but to confirm:
> 
> I had failures in my code with an upgrade for 1.14 -- turns out it was a
> single line in a single test fixture, so no big deal, but a regression
> just the same, with no deprecation warning.
> 
> I was essentially doing this:
> 
> In [*48*]: dt
> 
> Out[*48*]: dtype([('time', '<i8'), ('value', [('u', '<f8'), ('v',
> '<f8')])], align=True)
> 
> 
> In [*49*]: uv
> 
> Out[*49*]: 
> 
> array([[1., 1.],
> 
>        [1., 1.],
> 
>        [1., 1.],
> 
>        [1., 1.]])
> 
> 
> In [*50*]: time
> 
> Out[*50*]: array([1, 1, 1, 1])
> 
> 
> In [*51*]: full = np.array(zip(time, uv), dtype=dt)
> 
> ---------------------------------------------------------------------------
> 
> ValueError                                Traceback (most recent call last)
> 
> <ipython-input-51-ed726f71dd4a>in <module>()
> 
> ----> 1full =np.array(zip(time,uv),dtype=dt)
> 
> 
> ValueError: setting an array element with a sequence.
> 
> 
> 
> It took some poking, but the solution was to do:
> 
> full = np.array(zip(time, (tuple(w) *for*w *in*uv)), dtype=dt)
> 
> 
> That is, convert the values to nested tuples, rather than an array in a
> tuple, or a list in a tuple.
> 
> As I said, my problem is solved, but to confirm:
> 
> 1) This is a known change with good reason?

This change is a little different from what we discussed before. The
change occurred because the old assignment behavior was dangerous, and
was not doing what you thought. If you modify your dtype above changing
both 'f8' fields to 'f4', you will see you get very strange results:
Your array gets filled in with the values
(1, ( 0.,  1.875)).

Here's what happened: Previously, numpy was *not* iterating your data as
a sequence. Instead, if numpy did not find a tuple it would interpret
the data a a raw buffer and copy the value byte-by-byte, ignoring
endianness, casting, stride, etc. You can get even weirder results if
you do `uv = uv.astype('i4')`, for example.

It happened to work for you because ndarrays expose a buffer interface,
and you were assigning using exactly the same type and endianness.

In 1.14 the fix was to disallow this 'buffer' assignment for structured
arrays, it was causing quite confusing bugs. Unstructured "void" arrays
still do this though.

> 2) My solution was the best (only) one -- the only way to set a nested
> dtype like that is with tuples?

Right, our solution was to only allow assignment from tuples.

We might be able to relax that for structured scalars, but for arrays I
remember one consideration was to avoid confusion with array
broadcasting: If you do

    >>> x = np.zeros(2, dtype='i4,i4')
    >>> x[:] = np.array([3, 4])
    >>> x
    array([(3, 3), (4, 4)], dtype=[('f0', '<i4'), ('f1', '<i4')])

it might be the opposite of what you expect. Compare to

    >>> x[:] = (3, 4)
    >>> x
    array([(3, 4), (3, 4)], dtype=[('f0', '<i4'), ('f1', '<i4')])

> If so, then I think we should:
> 
> A) improve the error message.
> 
> "ValueError: setting an array element with a sequence."
> 
> Is not really clear -- I spent a while trying to figure out how I could
> set a nested dtype like that without a sequence? and I was actually
> using a ndarray, so it wasn't even a generic sequence. And a tuple is a
> sequence, too...
> 
> I had a vague recollection that in some circumstances, numpy treats
> tuples and lists (and arrays) differently (fancy indexing??), so I tried
> the tuple thing and that worked. But I've been around numpy a long time
> -- that could have been very very confusing to many people.
> 
> So could the message be changed to something like:
> 
> "ValueError: setting an array element with a generic sequence. Only the
> tuple type can be used in this context."
> 
> or something like that -- I'm not sure where else this same error
> message might pop up, so that could be totally inappropriate.

Good idea. I'll see if we can do it for 1.14.1.

> 2) maybe add a .totuple()method to ndarray, much like the .tolist()
> method? that would have been handy here.
>> -Chris
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959 <tel:%28206%29%20526-6959>   voice
> 7600 Sand Point Way NE   (206) 526-6329 <tel:%28206%29%20526-6329>   fax
> Seattle, WA  98115       (206) 526-6317 <tel:%28206%29%20526-6317>  
> main reception
> 
> Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>