[Python-Dev] PEP 460: allowing %d and %f and mojibake

Stephen J. Turnbull stephen at xemacs.org
Mon Jan 13 05:27:27 CET 2014


Ethan Furman writes:
 > On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote:

 > > No, Nick's point is that there's no encoding needed there are all,
 > > just a bunch of methods that handle numbers in the range 0-255.  You
 > > can rationalize the particular choice of numbers by referring to the
 > > ASCII coded character set, and that's very useful to users.  But
 > > knowledge of ASCII isn't necessary to specify these methods; they can
 > > be defined in an encoding/decoding-free way.
 > 
 > How can you say that with a straight face? [1]

Because I showed you code that does it.  Did you see an .encode or a
.decode in there?

 > Do you really think that .title, .isalnum, and .center (to name
 > only a few) would work the same if the assumed encoding was EBCIDC?

Yes, yes, and yes.  The numbers involved would change, and the test
for finding letters would be different (and more complicated IIRC).
The only one to worry about is .title, but neither ASCII nor EBCDIC
has confused or multiple letter titlecase.

 > Do you think they would do the proper transformations, or return
 > the proper result, if the bytes they were used on were encoded
 > Japanese?

That depends on which Japanese encoding.  It would work correctly on
UTF-8 and on EUC-JP (packed), and not on any of the others.  But you
wouldn't consider that "ASCII-encoded text", would you?

 > >> But bytes already acknowledges an ASCII bias.
 > >
 > > True, but that bias is implemented without use of encoding or
 > > decoding.   b'%d' % (123,) -> b'123' does require encoding, at the
 > > very least in the sense of type change and serialization.
 > 
 > You mean like changing a number into text does?  Really, this is no
 > different.

Precisely.  "There should be one- and preferably only one -way to do
it."  The one way uses text, so preferably bytes shouldn't.



More information about the Python-Dev mailing list