Because you can force the encoder to use a specified encoding. If you do this and the unicode string starts with an XML declaration
So what if the unicode string doesn't start with an XML declaration? Will it add one? If so, what version number will it use?
OK, so should I put the C code into a _xml module? I don't see the need for C code at all.
Doing the bit fiddling for Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the right thing to do.
Hmm. I don't think a sequence like + if (strlen>0) + { + if (*str++ != '<') + return 1; + if (strlen>1) + { + if (*str++ != '?') + return 1; + if (strlen>2) + { + if (*str++ != 'x') + return 1; + if (strlen>3) + { + if (*str++ != 'm') + return 1; + if (strlen>4) + { + if (*str++ != 'l') + return 1; + if (strlen>5) + { + if (*str != ' ' && *str != '\t' && *str != '\r' && *str != '\n') + return 1; is well-maintainable C. I feel it is much better writing if not s.startswith("<=?xml"): return 1 What bit fiddling are you referring to specifically that you think is better done in C than in Python? Regards, Martin