
Carl M. Johnson wrote:
On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Yes, Unicode was specifically designed to support that. The first 128 code points are identical with the ASCII encoding, the first 256 code points are identical with the Latin-1 encoding.
See also PEP 393, which exploits this feature.
http://www.python.org/dev/peps/pep-0393/
That being said, I don't see the point of aliasing "latin-1" to "bytes" in the codecs. That sounds confusing to me.
"bytes" is probably the wrong name for it, but I think using some name to signal "I'm not really using this encoding, I just need to be able to pass these bytes into and out of a string without losing any bits" might be better than using "latin-1" if we're forced to take up this hack. (My gut feeling is that it would be better if we could avoid using the "latin-1" hack all together, but apparently wiser minds than me have decided we have no other choice.) Maybe we could call it "passthrough"? And we could add a documentation note that if you use "passthrough" to decode some bytes you must, must, must use it to encode them later, since the string you manipulate won't really contain unicode codepoints, just a transparent byte encoding…
If you really wish to carry around binary data in a Unicode object, then you should use a codec that maps the 256 code points in a byte to either a private code point area or use a hack like the surrogateescape approach defined in PEP 383: http://www.python.org/dev/peps/pep-0383/ By using 'latin-1' you can potentially have the binary data leak into other text data of your application, or worse, have it converted to a different encoding on output, e.g. when sending the data to a UTF-8 pipe. In any case, this is bound to create hard to detect problems. Better use bytes to begin with. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2011)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2011-05-23: Released eGenix mx Base 3.2.0 http://python.egenix.com/ 2011-05-25: Released mxODBC 3.1.1 http://python.egenix.com/ 2011-06-20: EuroPython 2011, Florence, Italy 25 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/