[Tutor] UnicodeEncodeError

Mark Tolonen metolone+gmane at gmail.com
Sun Jul 19 18:47:02 CEST 2009

"gpo" <goodpotatoes at yahoo.com> wrote in message 
news:24554280.post at talk.nabble.com...
> I'm doing a simple excercise in reading a file, and printing each line.
> However, I'm getting this error.  The file is a windows txt file, ecoded 
> in
> ANSI(ascii).  I don't understand why Pythin is displaying a Unicode error.
> Here is my script:
> f=open('I:\\PythonScripts\\statement.txt')
> for line in f:
>    print (line)
> f.close()
> statement.txt is just a copy of the text from an article at
> (http://www.tgdaily.com/content/view/43296/98/) into notepad.
> The error I get is:
> Traceback (most recent call last):
>  File "I:\PythonScripts\fileloop.py", line 9, in
>    print (line)
>  File "C:\Python31\lib\encodings\cp437.py", line 19, in encode
>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in
> position
> 10: character maps to
> I've looked this up, and see that others have had a similar error; 
> however,
> I don't see any thing saying what I should be encoding to/from since my
> input and output files are both ascii.

Notepad doesn't encode in ASCII.  What it calls "ANSI" is the default file 
system encoding, which on US Windows is cp1252.  I see you are using Python 
3.1.  When not given an explicit encoding, Python 3.1 defaults to the 
default file system encoding.  It successfully reads the file, which happens 
to contain non-ASCII data.

cp1252 supports the Unicode character \u2014 (EM DASH).  However, when you 
try to print the file on the console, the console's default encoding is 
cp437 and doesn't support this character.  On Python 3.1 there is a 
function, ascii(), that will display the contents of a string using 
ascii-only.  Try print(ascii(line)).

Another trick is to switch to Lucida Console font and change the console's 
code page to 1252 with the command "chcp 1252".  This font and code page 
supports the EM DASH and will display the text properly with print(line).

You can also use a shell that supports the full Unicode character set such 
as Idle or PythonWin instead of the console.


More information about the Tutor mailing list