[Tutor] Unicode Encode Error
tim.golden at viacom-outdoor.co.uk
Thu Apr 27 15:19:06 CEST 2006
| I'm getting the following error when I try and write some HTML with
| German text in it.
| UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in
| position 1367: ordinal not in range(128)
| I've now read the 'Unicode - How To' by AMKuchling and
| changed the code to:
| html_text = codecs.open(inFile, encoding='utf-8').read()
| # ... do some processing on html_text
| f = codecs.open(outFile, encoding='utf-8', mode='w')
| but I'm still getting the error.
| Does anybody know what I'm doing wrong?
At what point in your script does the error occur? You say
"when I try and write" but does the error occur after the
f.write... line or earlier on after the codecs.open (..).read
There doesn't seem to be anything wrong with your code fragment,
except that you are assuming -- and maybe with good reason --
that the text in inFile is in fact utf8-encoded.
Could you run your code fragment on an interpreter and
then dump the screen output into an email?
This, for example, works on my Win32 Python console:
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> inFile = "in.utf8"
>>> outFile = "out.utf8"
... # Create dummy infile
... f = codecs.open (inFile, encoding="utf-8", mode="w")
>>> f.write (u"Tim G\xf6lden")
>>> f.close ()
>>> html_text = codecs.open(inFile, encoding='utf-8').read()
>>> # ... do some processing on html_text
... f = codecs.open(outFile, encoding='utf-8', mode='w')
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
More information about the Tutor