UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in position 0: ordinal not in range(128)
Dave Angel
davea at davea.name
Wed Jan 14 05:24:52 EST 2015
On 01/13/2015 10:26 PM, Peng Yu wrote:
> Hi,
>
First, you should always specify your Python version and OS version when
asking questions here. Even if you've been asking questions, many of
us cannot keep track of everyone's specifics, and need to refer to a
standard place, the head of the current thread.
I'll assume you're using Python 2.7, on Linux or equivalent.
> I am trying to understand what does encode() do. What are the hex
> representations of "u" in main.py? Why there is UnicodeEncodeError
> when main.py is piped to xxd? Why there is no such error when it is
> not piped? Thanks.
>
> ~$ cat main.py
> #!/usr/bin/env python
>
> u = unichr(40960) + u'abcd' + unichr(1972)
> print u
The unicode characters in 'u' must be decoded to a byte stream before
sent to the standard out device. How they're decoded depends on the
device, and what Python knows (or thinks it knows) about it.
> ~$ cat main_encode.py
> #!/usr/bin/env python
>
> u = unichr(40960) + u'abcd' + unichr(1972)
> print u.encode('utf-8')
Here, print is trying to send bytes to a byte-device, and doesn't try to
second guess anything.
> $ ./main.py
> ꀀabcd
> ~$ cat main.sh
> #!/usr/bin/env bash
>
> set -v
> ./main.py | xxd
> ./main_encode.py | xxd
>
> ~$ ./main.sh
> ./main.py | xxd
> Traceback (most recent call last):
> File "./main.py", line 4, in <module>
> print u
> UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in
> position 0: ordinal not in range(128)
> ./main_encode.py | xxd
> 0000000: ea80 8061 6263 64de b40a ...abcd...
>
I'm guessing (since i already guessed you're running on Linux) that in
the main_encode case, you're printing to a terminal window that Python
already knows is utf-8.
But in the pipe case, it cannot tell what's on the other side. So it
guesses ASCII, and runs into the conversion problem.
(Everything's different in Python 3.x, though in general the problem
still exists. If the interpreter cannot tell what encoding is needed,
it has to guess.)
There are ways to tell Python 2.7 what encoding a given file object
should have, so you could tell Python to use utf-8 for sys.stdout. I
don't know if that's the best answer, but here's what my notes say:
import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
Once you've done that, print output will go through the specified codec
on the way to the redirected pipe.
--
DaveA
More information about the Python-list
mailing list