UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

Sat Jul 18 14:25:53 EDT 2009

Thanks David, it solved my problem immediately. 

I will follow your advise from next time but honestly I am new to python
with not much knowledge about text formats. And the main portion of my
project was not to deal with these, so I just wanted to get this solved as I
was already struck at this for 2 days. If you think I am wrong in my
approach to getting problems solved, please let me know. Your advise would
be helpful in future for me.

--Thanks Again,
Akhil 

Scott David Daniels wrote:
> 
> akhil1988 wrote:
> <mis-ordered reply, bits shown below>>
>> Nobody-38 wrote:
>>> On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:
> ...
>>>>> In Python 3 you can't decode strings because they are Unicode strings
>>>>> and it doesn't make sense to decode a Unicode string. You can only
>>>>> decode encoded things which are byte strings. So you are mixing up
>>>>> byte
>>>>> strings and Unicode strings.
>>>> ... I read a byte string from sys.stdin which needs to converted to
>>>> unicode
>>>> string for further processing.
>>> In 3.x, sys.stdin (stdout, stderr) are text streams, which means that
>>> they
>>> read and write Unicode strings, not byte strings.
>>>
>>>> I cannot just remove the decode statement and proceed?
>>>> This is it what it looks like:
>>>>     for line in sys.stdin:
>>>>         line = line.decode('utf-8').strip()
>>>>         if line == '<page>': #do something here
>>>>         ....
>>>> If I remove the decode statement, line == '<page>' never gets true. 
>>> Did you inadvertently remove the strip() as well?
>> ... unintentionally I removed strip()....
>> I get this error now:
>>  File "./temp.py", line 488, in <module>
>>     main()
>>   File "./temp.py", line 475, in main
>>     for line in sys.stdin:
>>   File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
>>     (result, consumed) = self._buffer_decode(data, self.errors, final)
>> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
>> invalid
>> data
> 
> (1) Do not top post.
> (2) Try to fully understand the problem and proposed solution, rather
>      than trying to get people to tell you just enough to get your code
>      going.
> (3) The only way sys.stdin can possibly return unicode is to do some
>      decoding of its own.  your job is to make sure it uses the correct
>      decoding.  So, if you know your source is always utf-8, try
>      something like:
> 
>      import sys
>      import io
> 
>      sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')
> 
>      for line in sys.stdin:
>          line = line.strip()
>          if line == '<page>':
>              #do something here
>          ....
> 
> --Scott David Daniels
> Scott.Daniels at Acm.Org
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 

-- 
View this message in context: http://www.nabble.com/UnicodeEncodeError%3A-%27ascii%27-codec-can%27t-encode-character-u%27%5Cxb7%27-in-position-13%3A-ordinal-not-in-range%28128%29-tp24509879p24550540.html
Sent from the Python - python-list mailing list archive at Nabble.com.