[Tutor] UnicodeEncodeError
spir
denis.spir at free.fr
Wed Nov 25 15:12:55 CET 2009
Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> # CODE:
> for element in doc.getiterator():
> try:
> m = re.match(search_text, str(element.text))
> except UnicodeEncodeError:
> raise # I want to get rid of this exception.
First, you should separate both actions done in a single statement to isolate the source of error:
for element in doc.getiterator():
try:
source = str(element.text)
except UnicodeEncodeError:
raise # I want to get rid of this exception.
else:
m = re.match(search_text, source)
I guess
source = unicode(element;text, "utf8")
should do the job if, actually, you know elements are utf8 encoded (else try latin1, or better get proper information on origin of you doc files).
PS: I just discovered python's builtin attribute file.encoding that should give you the proper encoding to pass to unicode(..., encoding).
PPS: You should in fact decode the whole source before parsing it, no? (meaning parsing a unicode object, not encoded text)
Denis
________________________________
la vita e estrany
http://spir.wikidot.com/
More information about the Tutor
mailing list