[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Thu Jun 2 22:14:30 CEST 2011

On 6/2/2011 1:58 PM, Guido van Rossum wrote:
> On Wed, Jun 1, 2011 at 11:30 PM, Terry Reedy<tjreedy at udel.edu>  wrote:
>> The confusion of character with byte in the original design of Python both
>> privileged and burdened text processing.
>
> Right. And it wasn't only Python: most languages created around or
> before that time had the same issues (perhaps starting with C's use of
> "char" meaning byte). Even most IP protocols developed in the 1990s
> confuse character set and encoding (witness HTTP's "Content-type:
> text/plain; charset=utf-8").

I hold Python to a higher standard. But yes, that is badly confused.

> I'm glad in Python 3 we undertook to improve the distinction.

I am a bit embarassed that I did not see sooner that characters are for 
people and bytes for computers. Thus Python produces both character and 
byte serializations for objects.

On the coding front: when I first did statistics on computers (1970s), 
all data were coded with numbers. For instance, Sex: male = 1; female = 
2; unknown = 9. In the 1980s, we could use letters (which became ascii 
codes): male = 'm'; female = 'f'; unknown = ' '. For a US-only project, 
this seemed like an advance. So I though then.

For a global project, it would have been the opposite. For a Spanish 
speaker, 'm' might seem to mean 'mujer' (woman). For many others around 
the world, euro-indic digits are more familiar and easier to read than 
latin letters. I am less ethnocentric now.

I'm glad Python has become more of a global language, even if English based.

-- 
Terry Jan Reedy