[I18n-sig] Re:Strawman Proposal: Encoding Declaration V2

Paul Prescod paulp@ActiveState.com
Sat, 10 Feb 2001 23:05:03 -0800

Frank Chen wrote:
> ...
> So, if one day I declare Big5 as the encoding, I cannot use any ASCII
> character in my Python script?
> Does it mean this?
> if I set a = "characters='abc'", in the future it doesn't work? I need to
> use Big5 characters
> as identifiers and also the contents of strings when encoding declaraction
> is set to Big5?

I'm pretty sure that ASCII characters are Big5 characters and they are
encoded in the same way as in pure ASCII. So yes, you can continue to
use ASCII characters in Big5-encoded scripts.

The current proposal only has any "effect" on Unicode literals anyhow.
The only danger is that just as today you must not use a Big5 character
with a second byte that would confuse an ASCII-based parser. The second
byte must never equate to ASCII "\" or '"'. I presume you are already
careful about that.

> ...
> Like a preprocessor, to convert local encoding characters into Unicode
> first?
> And then feed it to the compiler?

*Conceptually* this is how I think of it. That *could* one day allow
identifiers to be in any language. It also means that we could one day
get rid of the silly restrictions on the second byte of two-byte

Others think of it as just a post-parse transformation on ONLY Unicode
(u"") literals. Until the issue of non-ASCII identifiers comes up, there
is no practical difference. So you can think of it either way.

The first implementation will likely be a post-parse transformation
because it is easier to implement in a non-Unicode parser.

 Paul Prescod