[Numpy-discussion] proposal: smaller representation of string arrays

Thu Apr 20 15:15:33 EDT 2017

Perhaps `np.encoded_str[encoding]` as the name for the new type, if we
decide a new type is necessary?

Am I right in thinking that the general problem here is that it's very easy
to discard metadata when working with dtypes, and that by adding metadata
to `unicode_`, we risk existing code carelessly dropping it? Is this a
problem in both C and python, or just C?

If that's the case, can we end up with a compromise where being careless
just causes old code to promote to ucs32?

On Thu, 20 Apr 2017 at 20:09 Anne Archibald <peridot.faceted at gmail.com>
wrote:

> On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor <
> jtaylor.debian at googlemail.com> wrote:
>
>> I probably have formulated my goal with the proposal a bit better, I am
>> not very interested in a repetition of which encoding to use debate.
>> In the end what will be done allows any encoding via a dtype with
>> metadata like datetime.
>> This allows any codec (including truncated utf8) to be added easily (if
>> python supports it) and allows sidestepping the debate.
>>
>> My main concern is whether it should be a new dtype or modifying the
>> unicode dtype. Though the backward compatibility argument is strongly in
>> favour of adding a new dtype that makes the np.unicode type redundant.
>>
>
> Creating a new dtype to handle encoded unicode, with the encoding
> specified in the dtype, sounds perfectly reasonable to me. Changing the
> behaviour of the existing unicode dtype seems like it's going to lead to
> massive headaches unless exactly nobody uses it. The only downside to a new
> type is having to find an obvious name that isn't already in use. (And
> having to actively  maintain/deprecate the old one.)
>
> Anne
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170420/3c5be976/attachment.html>