Re: [Numpy-discussion] String type again.

15 Jul 2014

      On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebastian@sipsolutions.net>
wrote:
...
Just wondering, couldn't we have a type which actually has an
 (arbitrary, python supported) encoding (and "bytes" might even just be a
special case of no encoding)?
well, then we're back to the core issue here:

numpy dtypes need to be a pre-specified length

encoded bytes are an arbitrary length.

This leads us to wanting to use only fixed-number-of-bytes-per-character
encodings:
 - ascii
 - latin-a
 - UCS-4 (or UTF-32..I get a bit confused about the names)

maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise
between space and fraction of unicode supported.

Basically storing bytes and on access do
...
element[i].decode(specified_encoding) and on storing element[i] =
value.encode(specified_encoding).
this really doesn't seem that different than just using python strings --
is there a point to having a pointer-to-python-string type as a less
generalized version of the currently possible  python strings in object
arrays?

 There is always the never ending small issue of trailing null bytes. If
...
we want to be fully compatible, such a type would have to store the
string length explicitly to support trailing null bytes.
are null bytes legal (as something other than a terminator) in some
encodings?

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov

Re: [Numpy-discussion] String type again.

Chris Barker