[I18n-sig] Strawman Proposal (2): Encoding attributes
Fri, 09 Feb 2001 19:07:39 -0800
"M.-A. Lemburg" wrote:
> > ...
> > Also, if we wanted a quick hack, couldn't we implement it at first by
> > "decoding" to UTF-8? Then the parser could look for UTF-8 in Unicode
> > string literals and translate those into real Unicode.
> I don't want to do "quick hacks", so this is a non-option.
If it works and it is easy, there should not be a problem!
> Making the parser Unicode aware is non-trivial as it requires
> changing lots of the internals which expect 8-bit C char buffers.
Are you talking about the Python internals or the parser internals. If
the former, then I do not think you are correct. Only the parser needs
> If we change the parser to use Unicode, then we would
> have to decode *all* program text into Unicode and this is very
> likely to fail for people who put non-ASCII characters into their
> string literals.
Files with no declaration could be interpreted byte for char just as
they are today!
> ASCII is not Euro-centric at all since it is a common subset
> of very many common encodings which are in use today.
Oh come on! The ASCII characters are sufficient to encode English and a
very few other languages.
> would be, though... which is why ASCII was chosen as standard
> default encoding.
We could go back and forth on this but let me suggest you type in a
program with Latin 1 in your Unicode literals and try and see what
happens. Python already "recognizes" that there is a single logical
translation from "old style strings" to Unicode strings and vice versa.
> The added flexibility in choosing identifiers would soon turn
> against the programmers themselves. Others have tried this and
> failed badly (e.g. look at the language specific versions of
> Visual Basic).
That's a totally different and unrelated issue. Nobody is talking about
language specific Pythons. We're talking about allowing people to name
variables in their own languages. I think that anything else is