Treating a unicode string as latin-1

Thu Jan 3 08:55:24 EST 2008

-On [20080103 14:36], Simon Willison (simon at simonwillison.net) wrote:
>How can I tell Python "I know this says it's a unicode string, but I
>need you to treat it like a bytestring"?

Although it does not address the exact question it does raise the issue how
you are using ElementTree. When I use the following:

test.xml

<entry>
  <name>Bob\x92s Breakfast</name>
</entry>

parse.py

from xml.etree.ElementTree import ElementTree

xmlfile = open('test.xml')

tree = ElementTree()
tree.parse(xmlfile)
elem = tree.find('name')

print type(elem.text)

I get a string type back and not a unicode string.

However, if you are mixing encodings within the same file, e.g. cp1252 in an
UTF8 encoded file, then you are creating a ton of problems.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
When moved to complain about others, remember that karma is endless and it
is loving that leads to love...