OK -- onto proposals: 1) The default behaviour for numpy arrays of strings is compatible with
Python3's string model: i.e. fully unicode supporting, and with a character oriented interface. i.e. if you do::
arr = np.array(("this", "that",))
you get an array that can store ANY unicode string with 4 or less characters.
and arr[1] will return a native Python3 string object.
This is the use-case for "casual" numpy users -- not the folks writing H5py and the like, or the ones writing Cython bindings to C++ libs.
I see two options here: a) The current 'U' dtype -- fully meets the specs, and is already there. b) Having a pointer-to-a-python string dtype: -I take it that's what Pandas does and people seem happy. -That would get us variable length strings, and potentially other nifty string-processing. - It would lose the ability to interact at the binary level with other systems -- but do any other systems use UCS-4 anyway? - how would it work with pickle and numpy zip storage? Personally, I'm fine with (a), but (b) seems like it could be a nice addition. As the 'U' type already exists, the choice to add a python-string type is really orthogonal to the rest of this discussion. Note that I think using utf-8 internally to fit his need is a mistake -- it does not match well with the Python string model. That's it for use-case (1) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov