
Hello. Is there any reason why attributes of Element returned as bytestring if only contains ascii? In my application i need it to be unicode always. Is there a way to force lxml return element attribute as unicode? What is the preferred way of getting attributes as unicode? 1. value = elem.attrib['foo'] if isinstance(value, str): encoding = elem.getroottree().docinfo.encoding value = value.decode(encoding) 2. value = elem.attrib['foo'] if isinstance(value, str): value = value.decode('ascii') -- -------------------------------------------- Турнаев Евгений Викторович +7 906 875 09 43 --------------------------------------------

Evgeny Turnaev, 28.11.2011 12:01:
Is there any reason why attributes of Element returned as bytestring if only contains ascii?
Yes. Partly for ElementTree compatibility and partly because it's faster and more memory friendly under Python 2.x. Also note that it's not just attribute names and values. All string values work this way in lxml.
In my application i need it to be unicode always.
In Python 3, lxml will always give you Unicode strings. In Python 2, ASCII encoded byte strings are compatible with the equivalent Unicode strings (as long as the platform default encoding is ASCII-compatible, which is "normally" the case), so you will rarely notice the difference in your code.
Is there a way to force lxml return element attribute as unicode?
No.
What is the preferred way of getting attributes as unicode?
If you really need a unicode string in Py2, you can do "unicode(value)" or "u'' + value". Stefan

Evgeny Turnaev, 28.11.2011 12:01:
Is there any reason why attributes of Element returned as bytestring if only contains ascii?
Yes. Partly for ElementTree compatibility and partly because it's faster and more memory friendly under Python 2.x. Also note that it's not just attribute names and values. All string values work this way in lxml.
In my application i need it to be unicode always.
In Python 3, lxml will always give you Unicode strings. In Python 2, ASCII encoded byte strings are compatible with the equivalent Unicode strings (as long as the platform default encoding is ASCII-compatible, which is "normally" the case), so you will rarely notice the difference in your code.
Is there a way to force lxml return element attribute as unicode?
No.
What is the preferred way of getting attributes as unicode?
If you really need a unicode string in Py2, you can do "unicode(value)" or "u'' + value". Stefan
participants (2)
-
Evgeny Turnaev
-
Stefan Behnel