[Numpy-discussion] Bytes vs. Unicode in Python3
pav at iki.fi
Fri Nov 27 07:41:54 EST 2009
pe, 2009-11-27 kello 13:23 +0100, René Dudfield kirjoitti:
> I imagine dtype 'S' and 'U' need more clarification. As it misses the
> concept of encodings it seems? Currently, S appears to mean 8bit
> characters no encoding, and U appears to mean 16bit characters no
> encoding? Or are some sort of default encodings assumed?
Currently in Numpy in Python 2, 'S' is the same as Python 3 bytes, 'U'
is same as Python 3 unicode and probably in same internal representation
(need to check). Neither is associated with encoding info.
We need probably to change the meaning of 'S', as Francesc noted, and
add a separate bytes dtype.
> 2to3/3to2 fixers will probably have to be written for users code
> here... whatever is decided. At least warnings should be generated
> I'm guessing.
Possibly. Does 2to3 support plugins? If yes, it could be possible to
> btw, in my numpy tree there is a unicode_() alias to str in py3, and
> to unicode in py2 (inside the compat.py file). This helped us in many
> cases with compatible string code in the pygame port. This allows you
> to create unicode strings on both platforms with the same code.
Yes, I saw that. The name unicode_ is however already taken by the Numpy
scalar type, so we need to think of a different name for it. asstring,
Btw, do you want to rebase your distutils changes on top of my tree? I
tried yours out quickly, but there were some issues there that prevented
distutils from working. (Also, you can use absolute imports both for
Python 2 and 3 -- there's probably no need to use relative imports.)
More information about the NumPy-Discussion