Treating a unicode string as latin-1
Jeroen Ruigrok van der Werven
asmodai at in-nomine.org
Thu Jan 3 08:55:24 EST 2008
-On [20080103 14:36], Simon Willison (simon at simonwillison.net) wrote:
>How can I tell Python "I know this says it's a unicode string, but I
>need you to treat it like a bytestring"?
Although it does not address the exact question it does raise the issue how
you are using ElementTree. When I use the following:
test.xml
<entry>
<name>Bob\x92s Breakfast</name>
</entry>
parse.py
from xml.etree.ElementTree import ElementTree
xmlfile = open('test.xml')
tree = ElementTree()
tree.parse(xmlfile)
elem = tree.find('name')
print type(elem.text)
I get a string type back and not a unicode string.
However, if you are mixing encodings within the same file, e.g. cp1252 in an
UTF8 encoded file, then you are creating a ton of problems.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
When moved to complain about others, remember that karma is endless and it
is loving that leads to love...
More information about the Python-list
mailing list