Python's 8-bit cleanness deprecated?

Roman Suzi rnd at
Sat Feb 8 13:10:20 CET 2003

On Fri, 7 Feb 2003, Brian Quinlan wrote:

>> It was working all the way before 2.3a! And there were no great

>The problems were 

> that you didn't know how to interpret non-ASCII source files sent to you 

This caused no problems. Python < 2.3 just passed symbols as-is.

And also suppose my collegue on Windows OS send me a script by
pasting it into mail:

# coding: cp1251
print "..lalalala"

I being on Linux copied it into some file. But it is in koi8-r already! So the
first line is misleading. Adding it doesn't guarantee encoding!

> and couldn't have non-ASCII characters in Unicode literals.

I never use Unicode literals. What is the point to have Unicode 
literals in everything except utf-8 encoded file?

If I ever need Unicode, I could explicitly do it:

u = unicode("привет", "koi8-r")

Instead of adding 'u' to every literal and adding u.encode("koi8-r") to every
place were Unicode can't go. And it usually is of no much use:

>>> u = unicode("привет", "koi8-r")
>>> print u
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: 
ordinal not in range(128)

This said, I do not see much use of all encodings
except utf-8 if we want to have multilanguage strings!

And if we don't, why complicate matters by requiring programmer to add
a line, requiring interpreter to do checks on input encoding and such?!

That is why I suggest applying double-recoding logic only to those
scripts which explicitely state encoding, passing all others as-is
(not latin-1, but AS-IS, without any manipulations).


Sincerely yours, Roman Suzi
rnd at =\= My AI powered by Linux RedHat 7.3

More information about the Python-list mailing list