[Python-Dev] PEP 263 update (source encodings)

Martin v. Loewis martin@v.loewis.de
19 Apr 2002 22:52:26 +0200

Since Mr. Suzuki proposed a patch for phase 2 of PEP 263, I've been
thinking that the implementation could go right away to phase two;
skipping my implementation of phase 1. Still, providing the
transitional warning of phase 1 is desirable, and indeed possible with
this implementation.

So after discussion with MAL, I have updated the PEP to reflect this
strategy. There are only minimal user-visible changes:
- more encodings can be supported even in phase 1, in particular those
  based on ISO 2022 (provided that the proper codecs are available)
- it is now an error if the declared encoding and the text do not
  match. If there is no encoding declared, it is still only a warning
  if you then use non-ASCII bytes.

More interesting is the change to the implementation strategy: the
parser will create a codecs.StreamReader to obtain the input, and will
convert each line to UTF-8; parsing then operates on the UTF-8
strings. This has the advantage that all lexical processing can still
use the same char*, interpreted as ASCII, as before.

The declared encoding is preserved (as in my stage 1 implementation),
in order to recode string literals back to the original
encoding. Anybody familiar with the parser is encouraged to review
this code; see