[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
M.-A. Lemburg
mal at egenix.com
Thu May 26 13:13:29 CEST 2011
Carl M. Johnson wrote:
> On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>
>
>> Yes, Unicode was specifically designed to support that. The first 128 code
>> points are identical with the ASCII encoding, the first 256 code points are
>> identical with the Latin-1 encoding.
>>
>> See also PEP 393, which exploits this feature.
>>
>> http://www.python.org/dev/peps/pep-0393/
>>
>> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
>> the codecs. That sounds confusing to me.
>
>
> "bytes" is probably the wrong name for it, but I think using some name to
> signal "I'm not really using this encoding, I just need to be able to pass
> these bytes into and out of a string without losing any bits" might be
> better than using "latin-1" if we're forced to take up this hack. (My gut
> feeling is that it would be better if we could avoid using the "latin-1"
> hack all together, but apparently wiser minds than me have decided we have
> no other choice.) Maybe we could call it "passthrough"? And we could add a
> documentation note that if you use "passthrough" to decode some bytes you
> must, must, must use it to encode them later, since the string you
> manipulate won't really contain unicode codepoints, just a transparent byte
> encoding…
If you really wish to carry around binary data in a Unicode
object, then you should use a codec that maps the 256 code
points in a byte to either a private code point area or
use a hack like the surrogateescape approach defined in
PEP 383:
http://www.python.org/dev/peps/pep-0383/
By using 'latin-1' you can potentially have the binary data
leak into other text data of your application, or worse,
have it converted to a different encoding on output, e.g.
when sending the data to a UTF-8 pipe.
In any case, this is bound to create hard to detect problems.
Better use bytes to begin with.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, May 26 2011)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2011-05-23: Released eGenix mx Base 3.2.0 http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1 http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy 25 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list