Re: [Numpy-discussion] proposal: smaller representation of string arrays

20 Apr 2017


      On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor 
wrote:
...
I probably have formulated my goal with the proposal a bit better, I am
not very interested in a repetition of which encoding to use debate.
In the end what will be done allows any encoding via a dtype with
metadata like datetime.
This allows any codec (including truncated utf8) to be added easily (if
python supports it) and allows sidestepping the debate.
My main concern is whether it should be a new dtype or modifying the
unicode dtype. Though the backward compatibility argument is strongly in
favour of adding a new dtype that makes the np.unicode type redundant.
Creating a new dtype to handle encoded unicode, with the encoding specified
in the dtype, sounds perfectly reasonable to me. Changing the behaviour of
the existing unicode dtype seems like it's going to lead to massive
headaches unless exactly nobody uses it. The only downside to a new type is
having to find an obvious name that isn't already in use. (And having to
actively  maintain/deprecate the old one.)

Anne

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Anne Archibald