Re: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

June 2, 2011

      On 6/2/2011 1:58 PM, Guido van Rossum wrote:
...
On Wed, Jun 1, 2011 at 11:30 PM, Terry Reedy<tjreedy@udel.edu>  wrote:
...
The confusion of character with byte in the original design of Python both
privileged and burdened text processing.
Right. And it wasn't only Python: most languages created around or
before that time had the same issues (perhaps starting with C's use of
"char" meaning byte). Even most IP protocols developed in the 1990s
confuse character set and encoding (witness HTTP's "Content-type:
text/plain; charset=utf-8").
I hold Python to a higher standard. But yes, that is badly confused.
...
I'm glad in Python 3 we undertook to improve the distinction.
I am a bit embarassed that I did not see sooner that characters are for 
people and bytes for computers. Thus Python produces both character and 
byte serializations for objects.

On the coding front: when I first did statistics on computers (1970s), 
all data were coded with numbers. For instance, Sex: male = 1; female = 
2; unknown = 9. In the 1980s, we could use letters (which became ascii 
codes): male = 'm'; female = 'f'; unknown = ' '. For a US-only project, 
this seemed like an advance. So I though then.

For a global project, it would have been the opposite. For a Spanish 
speaker, 'm' might seem to mean 'mujer' (woman). For many others around 
the world, euro-indic digits are more familiar and easier to read than 
latin letters. I am less ethnocentric now.

I'm glad Python has become more of a global language, even if English based.

-- 
Terry Jan Reedy