[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

M.-A. Lemburg mal at egenix.com
Thu May 26 13:13:29 CEST 2011


Carl M. Johnson wrote:
> On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> 
> 
>> Yes, Unicode was specifically designed to support that. The first 128 code
>> points are identical with the ASCII encoding, the first 256 code points are
>> identical with the Latin-1 encoding.
>>
>> See also PEP 393, which exploits this feature.
>>
>> http://www.python.org/dev/peps/pep-0393/
>>
>> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
>> the codecs. That sounds confusing to me.
> 
> 
> "bytes" is probably the wrong name for it, but I think using some name to
> signal "I'm not really using this encoding, I just need to be able to pass
> these bytes into and out of a string without losing any bits" might be
> better than using "latin-1" if we're forced to take up this hack. (My gut
> feeling is that it would be better if we could avoid using the "latin-1"
> hack all together, but apparently wiser minds than me have decided we have
> no other choice.) Maybe we could call it "passthrough"? And we could add a
> documentation note that if you use "passthrough" to decode some bytes you
> must, must, must use it to encode them later, since the string you
> manipulate won't really contain unicode codepoints, just a transparent byte
> encoding…

If you really wish to carry around binary data in a Unicode
object, then you should use a codec that maps the 256 code
points in a byte to either a private code point area or
use a hack like the surrogateescape approach defined in
PEP 383:

http://www.python.org/dev/peps/pep-0383/

By using 'latin-1' you can potentially have the binary data
leak into other text data of your application, or worse,
have it converted to a different encoding on output, e.g.
when sending the data to a UTF-8 pipe.

In any case, this is bound to create hard to detect problems.

Better use bytes to begin with.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 26 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-05-23: Released eGenix mx Base 3.2.0      http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1              http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy               25 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-ideas mailing list