[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Carl M. Johnson cmjohnson.mailinglist at gmail.com
Thu May 26 12:59:58 CEST 2011


On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:


> Yes, Unicode was specifically designed to support that. The first 128 code
> points are identical with the ASCII encoding, the first 256 code points are
> identical with the Latin-1 encoding.
>
> See also PEP 393, which exploits this feature.
>
> http://www.python.org/dev/peps/pep-0393/
>
> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
> the codecs. That sounds confusing to me.


"bytes" is probably the wrong name for it, but I think using some name to
signal "I'm not really using this encoding, I just need to be able to pass
these bytes into and out of a string without losing any bits" might be
better than using "latin-1" if we're forced to take up this hack. (My gut
feeling is that it would be better if we could avoid using the "latin-1"
hack all together, but apparently wiser minds than me have decided we have
no other choice.) Maybe we could call it "passthrough"? And we could add a
documentation note that if you use "passthrough" to decode some bytes you
must, must, must use it to encode them later, since the string you
manipulate won't really contain unicode codepoints, just a transparent byte
encoding…

-- Carl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110526/a5685b92/attachment.html>


More information about the Python-ideas mailing list