
Hi all, we had a lengthy discussion yesterday and I guess we found a few use cases where tounicode() makes sense and a few counter-arguments why it might not be a good idea to expose that API at a similarly visible place as tostring(). I'm still convinced that it's a good idea to have that API, but as one of the arguments was that "people who don't understand unicode" (PeWDUUs) would be more likely to write broken code, I added this paragraph to api.txt, in the section that describes the unicode support of lxml. """ Note that the unicode strings returned by ``tounicode()`` never have an XML declaration and therefore do not specify an encoding. This makes it possible to pass them back into the lxml parsers. However, you may have to add a declaration yourself if you want to serialize such a unicode string to a byte stream later. In contrast, the ``tostring()`` function automatically adds a declaration as needed that reflects the encoding of the returned byte string. """ I hope that makes it clear enough for PeWDUUs what the advantage of using tostring() over tounicode() is and that you have to take care what you do with unicode strings. So, I propose leaving the API (and implementation) just as it is now. Regards, Stefan