[Numpy-discussion] proposal: smaller representation of string arrays

Chris Barker chris.barker at noaa.gov
Thu Apr 20 13:46:31 EDT 2017

On Thu, Apr 20, 2017 at 10:36 AM, Neal Becker <ndbecker2 at gmail.com> wrote:

> I'm no unicode expert, but can't we truncate unicode strings so that only
> valid characters are included?

sure -- it's just a bit fiddly -- and you need to make sure that everything
gets passed through the proper mechanism. numpy is all about folks using
other code to mess with the bytes in a numpy array. so we can't expect that
all numpy string arrays will have been created with numpy code.

Does python's string have a truncated encode option? i.e. you don't want to
encode to utf-8 and then just chop it off.



