[Python-Dev] XML codec?
Walter Dörwald
walter at livinglogic.de
Fri Nov 9 13:49:41 CET 2007
Martin v. Löwis wrote:
>> Because you can force the encoder to use a specified encoding. If you do
>> this and the unicode string starts with an XML declaration
>
> So what if the unicode string doesn't start with an XML declaration?
> Will it add one?
No.
> If so, what version number will it use?
If we added this we could add an extra argument version to the encoder
constructor defaulting to '1.0'.
>>>> OK, so should I put the C code into a _xml module?
>>> I don't see the need for C code at all.
>> Doing the bit fiddling for
>> Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the
>> right thing to do.
>
> Hmm. I don't think a sequence like
>
> + if (strlen>0)
> + {
> + if (*str++ != '<')
> + return 1;
> + if (strlen>1)
> + {
> + if (*str++ != '?')
> + return 1;
> + if (strlen>2)
> + {
> + if (*str++ != 'x')
> + return 1;
> + if (strlen>3)
> + {
> + if (*str++ != 'm')
> + return 1;
> + if (strlen>4)
> + {
> + if (*str++ != 'l')
> + return 1;
> + if (strlen>5)
> + {
> + if (*str != ' ' && *str != '\t' && *str !=
> '\r' && *str != '\n')
> + return 1;
>
> is well-maintainable C. I feel it is much better writing
>
> if not s.startswith("<=?xml"):
> return 1
The point of this code is not just to return whether the string starts
with "<?xml" or not. There are actually three cases:
* The string does start with "<?xml"
* The string starts with a prefix of "<?xml", i.e. we can only
decide if it starts with "<?xml" if we have more input.
* The string definitely doesn't start with "<?xml".
> What bit fiddling are you referring to specifically that you think
> is better done in C than in Python?
The code that checks the byte signature, i.e. the first part of
detect_xml_encoding_str().
Servus,
Walter
More information about the Python-Dev
mailing list