Mailman 3 Question about unicode strings - lxml - The Python XML Toolkit

Feb. 20, 2012

      Hi all

I happen to be following the mailing lists of both lxml and rpclib.

The guys at rpclib want to make a change to their code base to fix what they
see as a 'quirk' of lxml. I am not qualified to comment, but I thought I
would post the issue here in case anyone can suggest a cleaner solution.

With Python 3 and version 2.3.3, if you pass a unicode string to
etree.fromstring(...), and then retrieve a text node from the tree, you get
a unicode string back. If you pass in a byte array, you get a byte array
back.

With Python 2 and version 2.2.2 (I don't have 2.3.3), if you pass a unicode
string that contains a non-ASCII character, you get a unicode string back.
If you pass a unicode string that contains only ASCII characters, you get a
normal string back.

This behaviour is causing a problem to a user of rpclib, so the proposal is
that rpclib should always convert the string to unicode before returning it.
I don't know how they know that they passed in a unicode string in the first
place, but I assume they have a way of checking.

The maintainer of rpclib says "If you disagree, speak now or forever hold
your silence :))"

So I thought I would mention it here and see if it sounds ok.

Thanks

Frank Millman

Question about unicode strings

Frank Millman

Stefan Behnel

Frank Millman

Simon Sapin

Frank Millman

Stefan Behnel

Frank Millman

Simon Sapin

Frank Millman

tags

participants (3)