PEP: Defining Python Source Code Encodings
David Eppstein
eppstein at ics.uci.edu
Wed Jul 18 13:25:25 EDT 2001
In article <mailman.995475606.22563.python-list at python.org>,
Roman Suzi <rnd at onego.ru> wrote:
> >> And this is right. I even think encoding information could be EXTERNAL.
> >
> >No -- how are editors supposed to know about these external
> >files ?
>
> OK. But how do they know about encoding of the 8-bit documents?
> Documents have tags to show encoding. Then Python program must
> become a document with all those tags here and there.
>
> How do other languages solve this "problem"?
The two examples I know of are XML and GEDCOM. Others can speak better
than I about how XML does it. In GEDCOM
<http://www.gendex.com/gedcom55/55gctoc.htm>
there is a required character set tag internal to the file.
Allowed character sets currently are ANSEL (an ASCII-based code) and
Unicode; previous versions of GEDCOM allowed several others. I guess it's
assumed that e.g. 16-bit Unicode encodings would look different enough from
ASCII that you could tell how to parse the file into characters before
seeing the character set tag.
--
David Eppstein UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/
More information about the Python-list
mailing list