[Python-Dev] Help with Unicode arrays in NumPy
Travis E. Oliphant
oliphant.travis at ieee.org
Tue Feb 7 20:52:21 CET 2006
This is a design question which is why I'm posting here. Recently the
NumPy developers have become more aware of the difference between UCS2
and UCS4 builds of Python. NumPy arrays can be of Unicode type. In
other words a NumPy array can be made of up fixed-data-length unicode
strings.
Currently that means that they are "unicode" strings of basic size UCS2
or UCS4 depending on the platform. It is this duality that has some
people concerned. For all other data-types, NumPy allows the user to
explicitly request a bit-width for the data-type.
So, we are thinking of introducing another data-type to NumPy to
differentiate between UCS2 and UCS4 unicode strings. (This also means a
unicode scalar object, i.e. string of each of these, exactly one of
which will inherit from the Python type).
Before embarking on this journey, however, we are seeking advice from
individuals wiser to the way of Unicode on this list.
Perhaps all we need to do is be more careful on input and output of
Unicode data-types so that transfer of unicode can be handled correctly
on each platform.
Any thoughts?
-Travis Oliphant
More information about the Python-Dev
mailing list