PEP: Defining Python Source Code Encodings
eppstein at ics.uci.edu
Wed Jul 18 19:25:25 CEST 2001
In article <mailman.995475606.22563.python-list at python.org>,
Roman Suzi <rnd at onego.ru> wrote:
> >> And this is right. I even think encoding information could be EXTERNAL.
> >No -- how are editors supposed to know about these external
> >files ?
> OK. But how do they know about encoding of the 8-bit documents?
> Documents have tags to show encoding. Then Python program must
> become a document with all those tags here and there.
> How do other languages solve this "problem"?
The two examples I know of are XML and GEDCOM. Others can speak better
than I about how XML does it. In GEDCOM
there is a required character set tag internal to the file.
Allowed character sets currently are ANSEL (an ASCII-based code) and
Unicode; previous versions of GEDCOM allowed several others. I guess it's
assumed that e.g. 16-bit Unicode encodings would look different enough from
ASCII that you could tell how to parse the file into characters before
seeing the character set tag.
David Eppstein UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/
More information about the Python-list