Python nuube needs Unicode help
Diez B. Roggisch
deets at nospam.web.de
Thu Jan 11 18:20:24 EST 2007
gheissenberger at gmail.com schrieb:
> HELP!
> Guy who was here before me wrote a script to parse files in Python.
>
> Includes line:
> print u
> where u is a line from a file we are parsing.
> However, we have started recieving data from Brazil. If I open file to
> parse in VI, looks like:
>
> <Utt id="3" transcribe="yes" audioRoot="A1"
> audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
> recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
> transcribedText="não" parsableText="não"/
>
> Clearly those "nã" are some non-Ascii characters, but how do I get
> print to understand that?
>
> I keep getting:
> "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
> position 40:
> ordinal not in range(128)"
>
Does the error happen at the
print u
line? If yes, what happens is that you try and print a unicode object.
Which means that it has to be converted (actually the right term is
encoded) to a byte-string. If you don't do that explicitely, it will be
done implicitly, using the default encoding - which is ascii.
If you have non-ascii characters, you end up with the error you see.
What to do? Use something like this:
print u.encode('utf-8')
instead.
Diez
More information about the Python-list
mailing list