[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Andrew Barnert abarnert at yahoo.com
Thu Sep 18 08:05:30 CEST 2014


On Sep 17, 2014, at 21:21, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Andrew Barnert writes:
> 
>> The possibility of confusion might be increased if some of the
>> options to bytes look like they should work for str. People will
>> ask, "I can chunk bytes into groups of 4 with /4, why can't I do
>> that with characters when the rest of the format specifier is the
>> same?"
> 
> Isn't the answer to that kind of question "because you haven't written
> the PEP yet?"
> 
> Or "Repeat after me, 'bytes are not str' ... Very good, now do a set
> of 100 before each meal for a week."  

As long as you don't ask for a set of 100 bytearrays, because they're not hashable.

> After all, there are things you
> can do with integer or float formats that you can't do with str and
> vice versa.
> 
> bytes are indeed very similar to str as streams of code units (octets
> vs. characters), but the specific usages for human-oriented text
> (including such unnatural languages as C and Perl) require some
> differences in semantics.  The sooner people get comfortable with
> that, the better, of course, but I don't think the language should be
> prevented from evolving because many people are going to take a while
> to get the difference and its importance.

I think we agree on all of that. 

(By the way, is there a word for that Unicode ignorance and confusion? Something like "illiteracy" and "innumeracy", but probably spelled with a non-BMP character, maybe U+1F4A9?)

My point is that, given a choice between two APIs, one which reinforces the illusion that bytes are text and one which doesn't, the latter gets points. (And similarly for format vs. printf.)

Of course on the other hand, when str and bytes really _are_ perfect parallels in some way, making them gratuitously inconsistent just adds more things to learn and memorize.

At this point, I'm not sure that adds up to an argument for Nick's less-str-like version of his original proposal, or against it, but I'm pretty sure it's a good argument for one or other...

>> (Of course eventually they want to do something where the format
>> isn't identical to printf, and many of them seem to go to
>> StackOverflow or IRC and complain that there's a "bug in
>> str.format" instead of just glancing at the docs, so maybe making
>> them learn early isn't such a bad thing...)
> 
> Obviously, given the snotty remark above, I sympathize.  But I doubt
> it's really going to help that.  It's just going to give them one more
> thing to complain about.<wink/>

Yes, people can be amazingly good at avoiding learning. 



More information about the Python-ideas mailing list