[Numpy-discussion] proposal: smaller representation of string arrays
Chris Barker
chris.barker at noaa.gov
Wed Apr 26 18:44:03 EDT 2017
On Wed, Apr 26, 2017 at 11:38 AM, Sebastian Berg <sebastian at sipsolutions.net
> wrote:
> I remember talking with a colleague about something like that. And
> basically an annoying thing there was that if you strip the zero bytes
> in a zero padded string, some encodings (UTF16) may need one of the
> zero bytes to work right.
I think it's really clear that you don't want to mess with the bytes in any
way without knowing the encoding -- for UTF-16, the code unit is two bytes,
so a "null" is two zero bytes in a row.
So generic "null padded" or "null terminated" is dangerous -- it would have
to be "Null-padded utf-8" or whatever.
Though I
> think it might have been something like "make everything in
> hdf5/something similar work"
That would be nice :-), but I suspect HDF-5 is the same as everything else
-- there are files in the wild where someone jammed the wrong thing into a
text array ....
-CHB
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170426/0ceba05e/attachment.html>
More information about the NumPy-Discussion
mailing list