Mailman 3 Standard for dtype string representation? - NumPy-Discussion

Feb. 8, 2023

      Hi,

We are in the process of using a standard representation of data types for
the forthcoming version of N-dim arrays in C-Blosc2, and we want to use the
NumPy string representation for that (see the end of
https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst).  It
might seem a bit strange to use the specification of a Python package for
that, but provided its predominant role in data science, I don't think this
should com as a surprise to anyone.

There are some small gotchas though.  For simple data types, the string
representation is *apparently* fine. E.g.:

In [16]: str(np.dtype("i8"))
Out[16]: 'int64'

However, as long as we try to represent the endianness of the type, we get:

In [17]: str(np.dtype(">i8"))
Out[17]: '>i8'

So, it uses the short version of the representation.  And the same happens
with the structured types:

In [22]: str(np.dtype("S1,i8"))
Out[22]: "[('f0', 'S1'), ('f1', '<i8')]"

Finally, the endianness seems to be represented arbitrarily.  E.g. in:

In [23]: str(np.dtype("S1"))
Out[23]: '|S1'

one can note the '|' char is prefixed to indicate endian independency,
while it does not appear in the structured representation.

While I know that there are some other representations for types in NumPy
(e.g. numeric integers via dtype.num), I very much appreciate (and I
suppose the same should go for other makers of numerical libraries) the
expressiveness of str(dtype), specially when it comes to structured dtypes,
if not were by the (relatively small) inconsistencies listed above.

BTW, I have had a quick glance at the Python array API standard effort (
https://data-apis.org/array-api/latest/API_specification/data_types.html#dat...),
but it does not seem this is being addressed.

For now, (and for the Python-Blosc2 wrapper) we are going in this direction:

if dtype.kind == 'V':
    repr = str(dtype)
else:
    repr = dtype.str

Is there a way (or an ongoing effort) to express the variety of data types
in NumPy that beats the above (which seems somewhat inconsistent to me)?

Thanks!
-- 
Francesc Alted

Standard for dtype string representation?

Francesc Alted

Sebastian Berg

Francesc Alted

Sebastian Berg

Francesc Alted

Sebastian Berg

Francesc Alted

Sebastian Berg

Francesc Alted

Sebastian Berg

Francesc Alted

Sebastian Berg

Francesc Alted

tags

participants (2)