On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor <jtaylor.debian@googlemail.com> wrote:
I probably have formulated my goal with the proposal a bit better, I am not very interested in a repetition of which encoding to use debate. In the end what will be done allows any encoding via a dtype with metadata like datetime. This allows any codec (including truncated utf8) to be added easily (if python supports it) and allows sidestepping the debate.
My main concern is whether it should be a new dtype or modifying the unicode dtype. Though the backward compatibility argument is strongly in favour of adding a new dtype that makes the np.unicode type redundant.
Creating a new dtype to handle encoded unicode, with the encoding specified in the dtype, sounds perfectly reasonable to me. Changing the behaviour of the existing unicode dtype seems like it's going to lead to massive headaches unless exactly nobody uses it. The only downside to a new type is having to find an obvious name that isn't already in use. (And having to actively maintain/deprecate the old one.) Anne