Re: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

May 26, 2011

      Carl M. Johnson wrote:
...
On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
...
Yes, Unicode was specifically designed to support that. The first 128 code
points are identical with the ASCII encoding, the first 256 code points are
identical with the Latin-1 encoding.
See also PEP 393, which exploits this feature.
http://www.python.org/dev/peps/pep-0393/
That being said, I don't see the point of aliasing "latin-1" to "bytes" in
the codecs. That sounds confusing to me.
"bytes" is probably the wrong name for it, but I think using some name to
signal "I'm not really using this encoding, I just need to be able to pass
these bytes into and out of a string without losing any bits" might be
better than using "latin-1" if we're forced to take up this hack. (My gut
feeling is that it would be better if we could avoid using the "latin-1"
hack all together, but apparently wiser minds than me have decided we have
no other choice.) Maybe we could call it "passthrough"? And we could add a
documentation note that if you use "passthrough" to decode some bytes you
must, must, must use it to encode them later, since the string you
manipulate won't really contain unicode codepoints, just a transparent byte
encoding…
If you really wish to carry around binary data in a Unicode
object, then you should use a codec that maps the 256 code
points in a byte to either a private code point area or
use a hack like the surrogateescape approach defined in
PEP 383:

http://www.python.org/dev/peps/pep-0383/

By using 'latin-1' you can potentially have the binary data
leak into other text data of your application, or worse,
have it converted to a different encoding on output, e.g.
when sending the data to a UTF-8 pipe.

In any case, this is bound to create hard to detect problems.

Better use bytes to begin with.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 26 2011)
...
...
...
Python/Zope Consulting and Support ...        http://www.egenix.com/
mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

2011-05-23: Released eGenix mx Base 3.2.0      http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1              http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy               25 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

Re: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

M.-A. Lemburg