[Python-Dev] PEP 460: allowing %d and %f and mojibake

Ethan Furman ethan at stoneleaf.us
Mon Jan 13 06:22:04 CET 2014


On 01/12/2014 08:27 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> On 01/12/2014 02:57 PM, Stephen J. Turnbull wrote:
>

I didn't trim enough to make my point clear.  My apologies.

>>> But
>>> knowledge of ASCII isn't necessary to specify these methods; they can
>>> be defined in an encoding/decoding-free way.

Perhaps you meant "use the methods".  I meant "write the methods".

You cannot write .upper for the bytes type without knowing what encoding has been used / is represented by those bytes. 
  And quite frankly, if you use those methods on bytes without knowing (1) which encoding is represented by the bytes 
and (2) that the function you are calling is meant to work with that encoding... well, you deserve what you get.


>> How can you say that with a straight face?
>
> Because I showed you code that does it.  Did you see an .encode or a
> .decode in there?

No, I didn't.  I saw numbers representing bytes representing text that has been encoded in the ASCII codec.  If you 
didn't know it was ASCII, you couldn't write that function.  Even though you don't have to call encode or decode if 
working directly with encoded bytes, you still have to know what the encoding is to do it correctly.


>> Do you really think that .title, .isalnum, and .center (to name
>> only a few) would work the same if the assumed encoding was EBCIDC?

I phrased that poorly.  If the byte stream was EBCIDC-encoded, and we called the current .method_which_assumes_ASCII on 
it, would we get the proper results?


> The numbers involved would change, and the test
> for finding letters would be different (and more complicated IIRC).

And you have actually just made my point.  If the bytes in question were EBCIDC-encoded, we could write a function for 
it because we know what it looks like as encoded bytes.  Then we could be debating the merits of working directly with 
EBCIDC-encoded text instead of ASCII-encoded text.  ;)


> "There should be one- and preferably only one -way to do
> it."  The one way uses text, so preferably bytes shouldn't.

You forgot the word "obvious".

--
~Ethan~


More information about the Python-Dev mailing list