[Python-3000] PEP 3120 (Was: PEP Parade)

Thu May 3 18:44:18 CEST 2007

On 5/3/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Untangling the parser from stdio - sure. I also think it would
> be desirable to read the whole source into a buffer, rather than
> applying a line-by-line input. That might be a bigger change,
> making the tokenizer a multi-stage algorithm:

> 1. read input into a buffer
> 2. determine source encoding (looking at a BOM, else a
>    declaration within the first two lines, else default
>    to UTF-8)
> 3. if the source encoding is not UTF-8, pass it through
>    a codec (decode to string, encode to UTF-8). Otherwise,
>    check that all bytes are really well-formed UTF-8.
> 4. start parsing

So people could hook into their own "codec" that, say, replaced native
language keywords with standard python keywords?

Part of me says that should be an import hook instead of pretending to
be a codec...

-jJ