[Numpy-discussion] load of custom .npy file fails with numpy 2.0.0

Thu Aug 2 18:16:35 EDT 2012

On Thu, Aug 2, 2012 at 3:13 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Thu, Aug 2, 2012 at 11:41 PM, Geoffrey Irving <irving at naml.us> wrote:
>> On Thu, Aug 2, 2012 at 1:26 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Thu, Aug 2, 2012 at 8:46 PM, Geoffrey Irving <irving at naml.us> wrote:
>>>> Hello,
>>>>
>>>> The attached .npy file was written from custom C++ code.  It loads
>>>> fine in Numpy 1.6.2 with Python 2.6 installed through MacPorts, but
>>>> fails on a different machine with Numpy 2.0.0 installed via Superpack:
>>>>
>>>> box:array% which python
>>>> /usr/bin/python
>>>> box:array% which python
>>>> box:array% python
>>>> Python 2.6.1 (r261:67515, Aug  2 2010, 20:10:18)
>>>> [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>> import numpy
>>>>>>> numpy.load('blah.npy')
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/npyio.py",
>>>> line 351, in load
>>>>     return format.read_array(fid)
>>>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/format.py",
>>>> line 440, in read_array
>>>>     shape, fortran_order, dtype = read_array_header_1_0(fp)
>>>>   File "/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/lib/format.py",
>>>> line 361, in read_array_header_1_0
>>>>     raise ValueError(msg % (d['descr'],))
>>>> ValueError: descr is not a valid dtype descriptor: 'd8'
>>>>>>> numpy.__version__
>>>> '2.0.0.dev-b5cdaee'
>>>>>>> numpy.__file__
>>>> '/Library/Python/2.6/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc'
>>>>
>>>> It seems Numpy 2.0.0 no longer accepts dtype('d8'):
>>>>
>>>>>>> dtype('d8')
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> TypeError: data type "d8" not understood
>>>>
>>>> Was that intentional?  An API change isn't too much of a problem, but
>>>> it's unfortunate if old data files are no longer easily readable.
>>>
>>> As far as I can tell, numpy has never described an array using 'd8'.
>>> That would be a really old compatibility typecode from Numeric, if I
>>> remember correctly. The intention of the NPY format standard was that
>>> it would accept what numpy spits out for the descr, not that it would
>>> accept absolutely anything that numpy.dtype() can consume, even
>>> deprecated aliases (though I will admit that that is almost what the
>>> NEP says). In particular, endianness really should be included or else
>>> your files will be misread on big-endian machines.
>>>
>>> My suspicion is that only your code has ever made .npy files with this
>>> descr. I feel your pain, Geoff, and I apologize that my lax
>>> specification led you down this path, but I think you need to fix your
>>> code anyways.
>>
>> Sounds good.  Both 1.6.2 and 2.0.0 write out '<f8' for the dtype.
>> I'll certainly add the '<' bit to signify endianness, but how should I
>> go about determining the letter?  My current code looks like
>>
>>   // Get dtype info
>>   int bits;char letter;
>>   switch(type_num){
>>       #define CASE(T) case
>> NPY_##T:bits=NPY_BITSOF_##T;letter=NPY_##T##LTR;break;
>>       #define NPY_BITSOF_BYTE 8
>>       #define NPY_BITSOF_UBYTE 8
>>       #define NPY_BITSOF_USHORT NPY_BITSOF_SHORT
>>       #define NPY_BITSOF_UINT NPY_BITSOF_INT
>>       #define NPY_BITSOF_ULONG NPY_BITSOF_LONG
>>       #define NPY_BITSOF_ULONGLONG NPY_BITSOF_LONGLONG
>>       CASE(BOOL)
>>       CASE(BYTE)
>>       CASE(UBYTE)
>>       CASE(SHORT)
>>       CASE(USHORT)
>>       CASE(INT)
>>       CASE(UINT)
>>       CASE(LONG)
>>       CASE(ULONG)
>>       CASE(LONGLONG)
>>       CASE(ULONGLONG)
>>       CASE(FLOAT)
>>       CASE(DOUBLE)
>>       CASE(LONGDOUBLE)
>>       #undef CASE
>>       default: throw ValueError("Unknown dtype");}
>>   int bytes = bits/8;
>>   ...
>>   len += sprintf(base+len,"{'descr': '%c%d', 'fortran_order': False,
>> 'shape': (",letter,bytes);
>>
>> The code incorrectly assumes that the ...LTR constants are safe ways
>> to describe dtypes.  Is there a clean, correct way to do this that
>> doesn't require special casing for each type?  I can use numpy headers
>> but can't call any numpy functions, since Python might not be
>> initialized (e.g., if I'm writing out files through MPI IO collectives
>> on a Cray).
>
> These characters plus the byte-size and endian-ness:
>
>         /*
>          * These are for dtype 'kinds', not dtype 'typecodes'
>          * as the above are for.
>          */
>         NPY_GENBOOLLTR ='b',
>         NPY_SIGNEDLTR = 'i',
>         NPY_UNSIGNEDLTR = 'u',
>         NPY_FLOATINGLTR = 'f',
>         NPY_COMPLEXLTR = 'c'
>
> Less amenable to macro magic, certainly, but workable. To
> double-check, see what numpy outputs for each of these cases.

Easy enough to add the two argument macro.  Thanks, I should be all set.

Geoffrey