[Numpy-discussion] adding more unicode dtypes

Julian Taylor jtaylor.debian at googlemail.com
Wed Jan 15 13:25:31 EST 2014


On 15.01.2014 18:57, Charles R Harris wrote:
> ...
> 
> There was a discussion of this long ago and UCS-4 was chosen as the
> numpy standard. There are just too many complications that arise in
> supporting both.
> 

my guess is that that discussion was before python3 and you could still
simply treat bytes == string?

In python3 you need extra code to deal with arrays containing strings as
the S type is interpreted as bytes which is not a string type anymore [0].
Someone on irc (I think Freddie Witherden CC'd) had a use case with huge
ascii tables in numpy which now have to be stored as 4 bytes unicode on
disk or decode bytes all the time.

I personally don't use strings in arrays so I can neither judge the
impact nor the use, but it seems to me like at least having an ascii
dtype for python2<->python3 compatibility would be useful.

[0] https://github.com/numpy/numpy/issues/4162



More information about the NumPy-Discussion mailing list