[Python-Dev] Help with Unicode arrays in NumPy

Travis E. Oliphant oliphant.travis at ieee.org
Tue Feb 7 20:52:21 CET 2006


This is a design question which is why I'm posting here.  Recently the 
NumPy developers have become more aware of the difference between UCS2 
and UCS4 builds of Python.  NumPy arrays can be of Unicode type.  In 
other words a NumPy array can be made of up fixed-data-length unicode 
strings.

Currently that means that they are "unicode" strings of basic size UCS2 
or UCS4 depending on the platform.  It is this duality that has some 
people concerned.  For all other data-types, NumPy allows the user to 
explicitly request a bit-width for the data-type.

So, we are thinking of introducing another data-type to NumPy to 
differentiate between UCS2 and UCS4 unicode strings.  (This also means a 
unicode scalar object, i.e. string of each of these, exactly one of 
which will inherit from the Python type).

Before embarking on this journey, however, we are seeking advice from 
individuals wiser to the way of Unicode on this list.

Perhaps all we need to do is be more careful on input and output of 
Unicode data-types so that transfer of unicode can be handled correctly 
on each platform.

Any thoughts?

-Travis Oliphant





More information about the Python-Dev mailing list