[Numpy-discussion] proposal: smaller representation of string arrays

Robert Kern robert.kern at gmail.com
Tue Apr 25 13:15:27 EDT 2017


On Tue, Apr 25, 2017 at 10:04 AM, Chris Barker <chris.barker at noaa.gov>
wrote:
>
> On Tue, Apr 25, 2017 at 9:57 AM, Ambrose LI <ambrose.li at gmail.com> wrote:
>>
>> 2017-04-25 12:34 GMT-04:00 Chris Barker <chris.barker at noaa.gov>:
>> > I am totally euro-centric,
>
>> But Shift-JIS is not one-byte; it's two-byte (unless you allow only
>> half-width characters and nothing else). :-)
>
> bad example then -- are their other non-euro-centric one byte per char
encodings worth worrying about? I have no clue :-)

I've run into Windows-1251 in files (seismic and well log data from Russian
wells). Treating them as latin-1 does not make for a happy time. Both
encodings also technically derive from ASCII in the lower half, but most of
the actual language is written with the high-bit characters.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170425/c01461e4/attachment.html>


More information about the NumPy-Discussion mailing list