[I18n-sig] Strawman Proposal (2): Encoding attributes

M.-A. Lemburg mal@lemburg.com
Sat, 10 Feb 2001 13:14:57 +0100

Paul Prescod wrote:
> "M.-A. Lemburg" wrote:
> >
> > ...
> >
> > Hmm, are you sure this would make the encoding declaration a
> > popular tool ?
> >
> > If we would just allow ASCII-supersets as source file encoding,
> > then we wouldn't have to make that restriction, since only the
> > Unicode literal handling in the parser would have to be adjusted
> > (and this is easy to do).
> We have always said that only ASCII-supersets should be legal source
> file encodings.

> The compromise is to make the use of non-ASCII bytes only legal inside
> of Unicode literals. Then in the future we can either go "my way"
> (decode the whole file) or "your way" (decode only literals).
> Is that acceptable?

No, it's too restrictive and would break programs written using
non-ASCII characters in normal string literals. We could agree
on this though:

1. programs which do not use the encoding declaration are free
   to use non-ASCII bytes in literals; Unicode literals must
   use Latin-1 (for historic reasons)

2. programs which do make use of the encoding declaration may
   only use non-ASCII bytes in Unicode literals; these are then
   interpreted using the given encoding information and decoded
   into Unicode during the compilation step

Part 1 assures backward compatibility. Part 2 assures that programmers
start to think about where they have to use Unicode and which
program literals are allowed to go into string literals. Part 1
is already implemented, part 2 is easy to do, since only the
compiler will have to be changed (in two places).

How's that for a compromise ?

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/