PEP: Defining Python Source Code Encodings

David Eppstein eppstein at
Wed Jul 18 19:25:25 CEST 2001

In article <mailman.995475606.22563.python-list at>,
 Roman Suzi <rnd at> wrote:

> >> And this is right. I even think encoding information could be EXTERNAL.
> >
> >No -- how are editors supposed to know about these external
> >files ?
> OK. But how do they know about encoding of the 8-bit documents?
> Documents have tags to show encoding. Then Python program must
> become a document with all those tags here and there.
> How do other languages solve this "problem"?

The two examples I know of are XML and GEDCOM.  Others can speak better 
than I about how XML does it.  In GEDCOM
there is a required character set tag internal to the file.
Allowed character sets currently are ANSEL (an ASCII-based code) and 
Unicode; previous versions of GEDCOM allowed several others.  I guess it's 
assumed that e.g. 16-bit Unicode encodings would look different enough from 
ASCII that you could tell how to parse the file into characters before 
seeing the character set tag.
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at

More information about the Python-list mailing list