[Python-Dev] directive statement (PEP 244)

M.-A. Lemburg mal@lemburg.com
Mon, 16 Jul 2001 21:02:58 +0200


Paul Prescod wrote:
> 
> "M.-A. Lemburg" wrote:
> > Paul suggested adding encoding directives for 8-bit
> > strings and comments, but these cannot be used by the Python
> > compiler in any way and would only be for the benefit of an
> > editor, so I don't really see the need for them.
> 
> Sorry I wasn't clear. Like \F, I think that the best model is that of
> XML, Java and (I've learned recently) Perl. There should be a single
> encoding for the file. Logically speaking it should be decoded before
> tokenization or parsing. Practically speaking it may be simpler to fake
> this logical decoding in the implementation. I don't care how it is
> implemented. Logically the model should be that any encoding declaration
> affects the interpretation of the *file* not some particular construct
> in the file.
> 
> If this is too difficult to implement today then maybe we should wait on
> the whole feature until someone has time to do it right.

Hmm, I guess you have something like this in mind...

1. read the file
2. decode it into Unicode assuming some fixed per-file encoding
3. tokenize the Unicode content
4. compile it, creating Unicode objects from the given Unicode data
   and creating string objects from the Unicode literal data
   by first reencoding the Unicode data into 8-bit string data

To make this backwards compatible, the implementation would have to
assume Latin-1 as the original file encoding if not given (otherwise,
binary data currently stored in 8-bit strings wouldn't make the
roundtrip).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/