On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Just wondering, couldn't we have a type which actually has an (arbitrary, python supported) encoding (and "bytes" might even just be a special case of no encoding)?
well, then we're back to the core issue here: numpy dtypes need to be a pre-specified length encoded bytes are an arbitrary length. This leads us to wanting to use only fixed-number-of-bytes-per-character encodings: - ascii - latin-a - UCS-4 (or UTF-32..I get a bit confused about the names) maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise between space and fraction of unicode supported. Basically storing bytes and on access do
element[i].decode(specified_encoding) and on storing element[i] = value.encode(specified_encoding).
this really doesn't seem that different than just using python strings -- is there a point to having a pointer-to-python-string type as a less generalized version of the currently possible python strings in object arrays? There is always the never ending small issue of trailing null bytes. If
we want to be fully compatible, such a type would have to store the string length explicitly to support trailing null bytes.
are null bytes legal (as something other than a terminator) in some encodings? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov