[Numpy-discussion] String type again.

Fri Jul 18 12:07:45 EDT 2014

18.07.2014 18:10, Julian Taylor kirjoitti:
[clip]
> We break code either way. Either we break applications using S as
> string type, but now it becomes bytes in python3. Or we break
> applications treating S as byte type and we change it to string in
> python3.
> 
> Unfortunately we missed the opportunity when adding python3 support
> to fix the same exact same bytes/text boundary issue which is the
> main reason why pythons3 exists in the first place. We should have
> made porting to numpy3 a intentionally(!) backward incompatible
> change just like python itself did.
> 
> Now we are stuck with deciding, which option breaks less. On the
> one hand, that S is bytes in python3 is somewhat established by now
> and lots of workarounds are already place. On the other hand, I
> think code that relies on S being bytes is in the minority and
> python3 usage is probably still  insignificant in this area.
> Unfortunately getting actual numbers and not wild guesses on this
> is probably not easy.

One way to try this out is to change the meaning of 'S' and see how
badly e.g. pandas or matplotlib break on py3 as a consequence.

Another approach would be to add a new 1-byte unicode as a type code
different from 'S'. The automatic ASCII encoding in
constructor/assignment on Py3 can be deprecated, which would make 'S'
a strict bytes dtype.

This also is not perfect, since array(['foo']) on Py2 should for
backward compatibility continue returning dtype='S'. Moreover,
already existing code does not make use of it.

-- 
Pauli Virtanen