Re: [Python-ideas] Python 3.x and bytes

May 20, 2011


      On 5/20/2011 1:44 AM, Stephen J. Turnbull wrote:
...
...
For people using non-Latin (non-ascii) alphabets, the 'convenience' of
 replacing some bytes with ascii-chars might be less convenient.
For us, the convenience remains.
I understood the thrust of this thread being that doing text
manipulation with bytes sometimes bites -- because bytes are not text.
Someone writing email or html bodies in Japanese or Farsi will not even
try that, but will use str (unicode) and encode to bytes only when done,
most likely transparently..

As far as I noticed, Ethan did not explain why he was extracting single
bytes and comparing to a constant, so it is hard to know if he was even
using them properly.
...
Japanese mail is transmitted via
SMTP, and the control function "hello" is still spelled "EHLO" in
Japanese mail.
I am not familiar with that control function, but if it is part of the
SMTP protocol, it has nothing to do with the language of the payload.
For programming a wire protocol that encodes abstract functions in ascii
chars, then the ascii char representation of bytes in convenient. That
is why it was chosen as the default.
...
Farsi web pages are formatted by HTML, and the control
function "new line" is spelled "<BR>" in Farsi, of course.
When writing the html *text* body, sure. But I presume browsers decode
encoded bytes to unicode *before* parsing the text. If so, it does not
really matter that '<br>' gets encoded to b'<br>'.
...
It's the pain that comes from the inevitable mixing of binary protocol
that looks like text with real text, turning the whole into an
unintelligible garble, that hurts so much harder for people who can't
properly write their names in ASCII.
ターンブル・スティーヴェンです-ly y'rs,
-- 
Terry Jan Reedy

Re: [Python-ideas] Python 3.x and bytes

Terry Reedy