[Tutor] Unicode Encode Error

Peter Otten __peter__ at web.de
Tue Jun 17 12:42:43 CEST 2014


Aaron Misquith wrote:

> I'm trying to obtain the questions present in StackOverflow for a
> particular tag.
> 
> Whenever I try to run the program i get this *error:*
> 
> Message File Name Line Position
> Traceback
>     <module> C:\Users\Aaron\Desktop\question.py 20
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in
> position 34: ordinal not in range(128)
> 
> 
> *This is the code:*
> import stackexchange
> import sys
> sys.path.append('.')
> so = stackexchange.Site(stackexchange.StackOverflow)
> term= raw_input("Enter the keyword")
> print 'Searching for %s...' % term,
> sys.stdout.flush()
> qs = so.search(intitle=term)
> 
> for q in qs:
>    print '%8d %s' % (q.id, q.title)
>    with open('D:\ques.txt', 'a+') as question:
>        question.write(q.title)
> 
> Can anyone explain me what is going wrong here? This program used to run
> perfectly fine before. Only since yesterday I have started getting this
> error.

The traceback and the code you post don't fit together as the latter has 
less than 20 lines. Therefore I have to guess:

q.title is probably unicode

When you are writing unicode to a file it is automatically converted to 
bytes assuming the ascii encoding

>>> f = open("tmp.txt", "w")
>>> f.write(u"abc")

However, this fails when the unicode string contains non-ascii characters:

>>> f.write(u"äöü")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)

So your code "worked" yesterday because the title you retrieved yesterday
did not contain any non-ascii chars.

The fix is to open the file with an encoding that can cope with the extra 
characters. UTF-8 is a good choice here. To use that modify your code as 
follows:

import codecs
...
with codecs.open(filename, "a", encoding="utf-8") as question:
    question.write(q.title)

PS: personally I'd open the file once outside the loop:

with codecs.open(...) as questions_file:
    for q in qs:
        print ...
        questions_file.write(q.title)




More information about the Tutor mailing list