[Numpy-discussion] proposal: smaller representation of string arrays

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Thu Apr 20 16:01:25 EDT 2017


> I suggest a new data type  'text[encoding]', 'T'.

I like the suggestion very much (it is even in between S and U!). The
utf-8 manifesto linked to above convinced me that the number that
should follow is the number of bytes, which is nicely consistent with
use in all numerical dtypes.

Any way, more specifically on Julian's question: it seems to me one
has little choice but to make a new dtype (and OK if that makes
unicode obsolete). I think what exact encodings to support is a
separate question.

-- Marten


More information about the NumPy-Discussion mailing list