[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Thu May 26 13:17:07 CEST 2011

On 2011-05-26, at 12:59 , Carl M. Johnson wrote:
> On Wed, May 25, 2011 at 11:15 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>> Yes, Unicode was specifically designed to support that. The first 128 code
>> points are identical with the ASCII encoding, the first 256 code points are
>> identical with the Latin-1 encoding.
>> 
>> See also PEP 393, which exploits this feature.
>> 
>> http://www.python.org/dev/peps/pep-0393/
>> 
>> That being said, I don't see the point of aliasing "latin-1" to "bytes" in
>> the codecs. That sounds confusing to me.
> 
> 
> "bytes" is probably the wrong name for it, but I think using some name to
> signal "I'm not really using this encoding, I just need to be able to pass
> these bytes into and out of a string without losing any bits" might be
> better than using "latin-1" if we're forced to take up this hack. (My gut
> feeling is that it would be better if we could avoid using the "latin-1"
> hack all together, but apparently wiser minds than me have decided we have
> no other choice.) Maybe we could call it "passthrough"? And we could add a
> documentation note that if you use "passthrough" to decode some bytes you
> must, must, must use it to encode them later, since the string you
> manipulate won't really contain unicode codepoints, just a transparent byte
> encoding…

Considering the original use case, which seems to be mostly about being able to use .format, would it make more sense to be able to create "byte patterns", with formats similar to those of str.format but not identical (e.g. better control on layout would be nice, something similar to Erlang's bit syntax for putting binaries together).

This would be useful to put together byte sequences from existing values to e.g. output binary formats.