[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.
Terry Reedy
tjreedy at udel.edu
Thu Jun 2 22:14:30 CEST 2011
On 6/2/2011 1:58 PM, Guido van Rossum wrote:
> On Wed, Jun 1, 2011 at 11:30 PM, Terry Reedy<tjreedy at udel.edu> wrote:
>> The confusion of character with byte in the original design of Python both
>> privileged and burdened text processing.
>
> Right. And it wasn't only Python: most languages created around or
> before that time had the same issues (perhaps starting with C's use of
> "char" meaning byte). Even most IP protocols developed in the 1990s
> confuse character set and encoding (witness HTTP's "Content-type:
> text/plain; charset=utf-8").
I hold Python to a higher standard. But yes, that is badly confused.
> I'm glad in Python 3 we undertook to improve the distinction.
I am a bit embarassed that I did not see sooner that characters are for
people and bytes for computers. Thus Python produces both character and
byte serializations for objects.
On the coding front: when I first did statistics on computers (1970s),
all data were coded with numbers. For instance, Sex: male = 1; female =
2; unknown = 9. In the 1980s, we could use letters (which became ascii
codes): male = 'm'; female = 'f'; unknown = ' '. For a US-only project,
this seemed like an advance. So I though then.
For a global project, it would have been the opposite. For a Spanish
speaker, 'm' might seem to mean 'mujer' (woman). For many others around
the world, euro-indic digits are more familiar and easier to read than
latin letters. I am less ethnocentric now.
I'm glad Python has become more of a global language, even if English based.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list