PEP: Defining Python Source Code Encodings

David Eppstein eppstein at ics.uci.edu
Wed Jul 18 13:25:25 EDT 2001


In article <mailman.995475606.22563.python-list at python.org>,
 Roman Suzi <rnd at onego.ru> wrote:

> >> And this is right. I even think encoding information could be EXTERNAL.
> >
> >No -- how are editors supposed to know about these external
> >files ?
> 
> OK. But how do they know about encoding of the 8-bit documents?
> Documents have tags to show encoding. Then Python program must
> become a document with all those tags here and there.
> 
> How do other languages solve this "problem"?

The two examples I know of are XML and GEDCOM.  Others can speak better 
than I about how XML does it.  In GEDCOM
<http://www.gendex.com/gedcom55/55gctoc.htm>
there is a required character set tag internal to the file.
Allowed character sets currently are ANSEL (an ASCII-based code) and 
Unicode; previous versions of GEDCOM allowed several others.  I guess it's 
assumed that e.g. 16-bit Unicode encodings would look different enough from 
ASCII that you could tell how to parse the file into characters before 
seeing the character set tag.
-- 
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/



More information about the Python-list mailing list