UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

Scott David Daniels Scott.Daniels at Acm.Org
Fri Jul 17 13:37:25 EDT 2009


akhil1988 wrote:
<mis-ordered reply, bits shown below>>
> Nobody-38 wrote:
>> On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:
...
>>>> In Python 3 you can't decode strings because they are Unicode strings
>>>> and it doesn't make sense to decode a Unicode string. You can only
>>>> decode encoded things which are byte strings. So you are mixing up byte
>>>> strings and Unicode strings.
>>> ... I read a byte string from sys.stdin which needs to converted to unicode
>>> string for further processing.
>> In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
>> read and write Unicode strings, not byte strings.
>>
>>> I cannot just remove the decode statement and proceed?
>>> This is it what it looks like:
>>>     for line in sys.stdin:
>>>         line = line.decode('utf-8').strip()
>>>         if line == '<page>': #do something here
>>>         ....
>>> If I remove the decode statement, line == '<page>' never gets true. 
>> Did you inadvertently remove the strip() as well?
> ... unintentionally I removed strip()....
> I get this error now:
>  File "./temp.py", line 488, in <module>
>     main()
>   File "./temp.py", line 475, in main
>     for line in sys.stdin:
>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
> data

(1) Do not top post.
(2) Try to fully understand the problem and proposed solution, rather
     than trying to get people to tell you just enough to get your code
     going.
(3) The only way sys.stdin can possibly return unicode is to do some
     decoding of its own.  your job is to make sure it uses the correct
     decoding.  So, if you know your source is always utf-8, try
     something like:

     import sys
     import io

     sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')

     for line in sys.stdin:
         line = line.strip()
         if line == '<page>':
             #do something here
         ....

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list