Re: [Python-Dev] XML codec?

Nov. 12, 2007


      Martin v. Löwis wrote:
...
...
I don't know. Is an XML document ill-formed if it doesn't contain an
XML declaration, is not in UTF-8 or UTF-8, but there's external
encoding info?
If there is external encoding info, matching the actual encoding,
it would be well-formed. Of course, preserving that information would
be up to the application.
OK. When the application passes an encoding to the decoder this is
supposed to be the external encoding info, so for the decoder it makes
sense to assume that the encoding passed to the encoder is the external
encoding info and will be transmitted along with the encoded bytes.
...
...
This looks good. Now we would have to extent the code to detect and
replace the encoding in the XML declaration too.
I'm still opposed to making this a codec. Right - for a pure Python
solution, the processing of the XML declaration would still need to
be implemented.
...
...
I think there could be a much simpler routine to have the same 
effect. - if it's less than 4 bytes, answer "need more data".
Can there be an XML document that is less then 4 bytes? I guess not.
No, the smallest document has exactly 4 characters (e.g. "<f/>").
However, external entities may be smaller, such as "x".
...
But anyway: would a Python implementation of these two functions
(detect_encoding()/fix_encoding()) be accepted?
I could agree to a Python implementation of this algorithm as long
as it's not packaged as a codec.
I still can't understand your objection to a codec. What's the
difference between UTF-16 decoding and XML decoding? In fact PEP 263
IMHO does specify how to decode Python source, so in theory it could be
a codec (in practice this probably wouldn't work because of
bootstrapping problems).

Servus,
   Walter

Re: [Python-Dev] XML codec?

Walter Dörwald