[Python-Dev] Unicode decode exception

Chris Angelico rosuav at gmail.com
Sun Nov 30 21:19:11 CET 2014


On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti
<balajimarisetti at gmail.com> wrote:
> Hi,

Hi. This list is for the development *of* Python, not development
*with* Python, so I'm sending this reply also to
python-list at python.org where it can be better handled. You'll probably
want to subscribe here:

https://mail.python.org/mailman/listinfo/python-list

or alternatively, point a news reader at comp.lang.python. Let's
continue this conversation on python-list rather than python-dev.

> When I try to iterate through the lines of a
> file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a
> UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no
> such error with python 2.7.6. What could be the problem?

The difference between the two Python versions is that 2.7 lets you be
a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep
them properly separate.

> In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
>                  for line in f:
>                      print (line)
>
> ---------------------------------------------------------------------------
> UnicodeDecodeError                        Traceback (most recent call last)
> <ipython-input-39-24a3ae32a691> in <module>()
>       1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
> ----> 2     for line in f:
>       3         print (line)
>       4
>
> /usr/lib/python3.4/codecs.py in decode(self, input, final)
>     311         # decode input (taking the buffer into account)
>     312         data = self.buffer + input
> --> 313         (result, consumed) = self._buffer_decode(data,
> self.errors, final)
>     314         # keep undecoded input until the next call
>     315         self.buffer = data[consumed:]
>
>
> --
> :-)balaji

Most likely, the line of input that you just reached has a non-ASCII
character, and the default encoding is ASCII. (Though without the
actual exception message, I can't be sure of that.) The best fix would
be to know what the file's encoding is, and simply add that as a
parameter to your open() call - perhaps this:

with open("filename", encoding="utf-8") as f:

If you use the right encoding, and the file is correctly encoded, you
should have no errors. If you still have errors... welcome to data
problems, life can be hard. :|

ChrisA



More information about the Python-list mailing list