[Python-3000] BOM handling
Josiah Carlson
jcarlson at uci.edu
Thu Sep 14 18:28:39 CEST 2006
Blake Winton <bwinton at latte.ca> wrote:
[snip]
> Um, what more data do we need for this use-case? I'm not going to
> suggest an API, other than it would be nice if I didn't have to manually
> figure out/hard code all the encodings. (It's my belief that I will
> currently have to do that, or at least special-case XML, to read the
> encoding attribute.) Oh, and it would be particularly horrible if I
> output a shell script in UTF-8, and it included the BOM, since I believe
> that would break the "magic number" of "#!".
Use the XML tag/attribute "<?xml ... encoding="..." ?> to discover the
encoding and assume utf-8 otherwise as per spec:
http://www.w3.org/TR/2000/REC-xml-20001006#NT-EncodingDecl
Does bash natively support utf-8? Is there a bash equivalent to Python
coding: directives? You may be attempting to fix a problem that doesn't
exist.
> Yeah, see, at a business level, I really need to process those all in
> the same way, and it would be annoying to have to write code to handle
> them all differently.
So you, or anyone else, can write a module for discovering the encoding
used for a particular file based on XML tags, Python coding: directives,
etc. It could include an extensible registry, and if it is used enough,
could be included in the Python standard library.
- Josiah
More information about the Python-3000
mailing list