
On 6/2/2011 1:58 PM, Guido van Rossum wrote:
On Wed, Jun 1, 2011 at 11:30 PM, Terry Reedy<tjreedy@udel.edu> wrote:
The confusion of character with byte in the original design of Python both privileged and burdened text processing.
Right. And it wasn't only Python: most languages created around or before that time had the same issues (perhaps starting with C's use of "char" meaning byte). Even most IP protocols developed in the 1990s confuse character set and encoding (witness HTTP's "Content-type: text/plain; charset=utf-8").
I hold Python to a higher standard. But yes, that is badly confused.
I'm glad in Python 3 we undertook to improve the distinction.
I am a bit embarassed that I did not see sooner that characters are for people and bytes for computers. Thus Python produces both character and byte serializations for objects. On the coding front: when I first did statistics on computers (1970s), all data were coded with numbers. For instance, Sex: male = 1; female = 2; unknown = 9. In the 1980s, we could use letters (which became ascii codes): male = 'm'; female = 'f'; unknown = ' '. For a US-only project, this seemed like an advance. So I though then. For a global project, it would have been the opposite. For a Spanish speaker, 'm' might seem to mean 'mujer' (woman). For many others around the world, euro-indic digits are more familiar and easier to read than latin letters. I am less ethnocentric now. I'm glad Python has become more of a global language, even if English based. -- Terry Jan Reedy