[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Nick Coghlan ncoghlan at gmail.com
Sat May 28 03:16:14 CEST 2011


On Sat, May 28, 2011 at 10:55 AM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>
>> The pedagogic cost of making it even harder than it already is to
>> convince people that bytes are not text would also need to be
>> considered.
>
> I think that boat was missed some time ago. If there were
> ever a serious intention to teach people that bytes are not
> text by limiting the feature set of bytes, it would have
> been better served by not giving bytes *any* features that
> assumed a particular encoding.
>
> As it is, bytes has quite a lot of features that implicitly
> treat it as ascii-encoded text: the literal and repr()
> forms, capitalize(), expandtabs(), lower(), splitlines(),
> swapcase(), title(), upper(), and all the is*() methods.
>
> Accepting all of that, and then saying "Oh, no, we couldn't
> possibly provide a format() method, because bytes are not
> text" seems a tad inconsistent.

Originally we didn't have all of that - more and more of it crept back
in at the behest of several binary protocol folks (including me, if I
recall correctly).

The urllib.parse experience has convinced me that giving in to that
pressure was a mistake. We went for a premature optimisation, and
screwed up the bytes API as a result. Yes, there is a potential
performance issue with the decode/process/encode model, but simple
keeping a bunch of string methods in the bytes API was the wrong
answer (and something that isn't actually all that useful in practice,
for the reasons brought up in this and other recent threads).

Perhaps it is time to resurrect the idea of an explicit 'ascii' type?
Add a'' literals, support the full string API as well as the bytes
API, deprecate all string APIs on bytes and bytearray objects. The
other thing I have learned in trying to deal with some of these issues
is that ASCII-encoded text really *is* special, compared to all other
encodings, due to its widespread use in a multitude of networking
protocols and other formats.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list