Python's 8-bit cleanness deprecated?
rnd at onego.ru
Sat Feb 8 13:10:20 CET 2003
On Fri, 7 Feb 2003, Brian Quinlan wrote:
>> It was working all the way before 2.3a! And there were no great
>The problems were
> that you didn't know how to interpret non-ASCII source files sent to you
This caused no problems. Python < 2.3 just passed symbols as-is.
And also suppose my collegue on Windows OS send me a script by
pasting it into mail:
# coding: cp1251
I being on Linux copied it into some file. But it is in koi8-r already! So the
first line is misleading. Adding it doesn't guarantee encoding!
> and couldn't have non-ASCII characters in Unicode literals.
I never use Unicode literals. What is the point to have Unicode
literals in everything except utf-8 encoded file?
If I ever need Unicode, I could explicitly do it:
u = unicode("привет", "koi8-r")
Instead of adding 'u' to every literal and adding u.encode("koi8-r") to every
place were Unicode can't go. And it usually is of no much use:
>>> u = unicode("привет", "koi8-r")
>>> print u
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5:
ordinal not in range(128)
This said, I do not see much use of all encodings
except utf-8 if we want to have multilanguage strings!
And if we don't, why complicate matters by requiring programmer to add
a line, requiring interpreter to do checks on input encoding and such?!
That is why I suggest applying double-recoding logic only to those
scripts which explicitely state encoding, passing all others as-is
(not latin-1, but AS-IS, without any manipulations).
Sincerely yours, Roman Suzi
rnd at onego.ru =\= My AI powered by Linux RedHat 7.3
More information about the Python-list