[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

Fredrik Lundh report at bugs.python.org
Sun Mar 21 15:38:40 CET 2010

Fredrik Lundh <fredrik at effbot.org> added the comment:

Hmm.  I'm not entirely sure about giving False a meaning when None has traditionally had a different (and documented) meaning.  And sleeping on it hasn't convinced me in either direction :-(

(well, I'd say no, but the compatibility argument is somewhat tempting)

I'm not that concerned by changing the default for write -- 3.x users with utf-8 as the default output encoding will get different output, but still perfectly valid XML.  3.x users with non-utf-8 default encodings  will get valid XML also in cases where it didn't work before.

tostring() is more problematic, but I'm leaning towards Guido's torpedoes approach there -- changing the default output to bytestrings is more likely to cause code to blow up than cause bad output, and you can trivially make your program backwards compatible by adding an extra check/decode after the call.  Supporting unicode for lxml.etree compatibility is fine with me, but I think it might make sense to support the string "unicode" as well (as a pseudo-encoding -- it's pretty clear to me that nobody will ever define a real character encoding with that name :-).

Have you posted/can you post the patch to riedveld, btw?  I have some questions about the code that are independent of the encoding decision.


Python tracker <report at bugs.python.org>

More information about the Python-bugs-list mailing list