
Hi, When I try to iterate through the lines of a file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no such error with python 2.7.6. What could be the problem? In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f: for line in f: print (line) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-39-24a3ae32a691> in <module>() 1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f: ----> 2 for line in f: 3 print (line) 4 /usr/lib/python3.4/codecs.py in decode(self, input, final) 311 # decode input (taking the buffer into account) 312 data = self.buffer + input --> 313 (result, consumed) = self._buffer_decode(data, self.errors, final) 314 # keep undecoded input until the next call 315 self.buffer = data[consumed:] -- :-)balaji

On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti <balajimarisetti@gmail.com> wrote:
Hi,
Hi. This list is for the development *of* Python, not development *with* Python, so I'm sending this reply also to python-list@python.org where it can be better handled. You'll probably want to subscribe here: https://mail.python.org/mailman/listinfo/python-list or alternatively, point a news reader at comp.lang.python. Let's continue this conversation on python-list rather than python-dev.
The difference between the two Python versions is that 2.7 lets you be a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep them properly separate.
Most likely, the line of input that you just reached has a non-ASCII character, and the default encoding is ASCII. (Though without the actual exception message, I can't be sure of that.) The best fix would be to know what the file's encoding is, and simply add that as a parameter to your open() call - perhaps this: with open("filename", encoding="utf-8") as f: If you use the right encoding, and the file is correctly encoded, you should have no errors. If you still have errors... welcome to data problems, life can be hard. :| ChrisA

On Sun, Nov 30, 2014 at 7:07 PM, balaji marisetti <balajimarisetti@gmail.com> wrote:
Hi,
Hi. This list is for the development *of* Python, not development *with* Python, so I'm sending this reply also to python-list@python.org where it can be better handled. You'll probably want to subscribe here: https://mail.python.org/mailman/listinfo/python-list or alternatively, point a news reader at comp.lang.python. Let's continue this conversation on python-list rather than python-dev.
The difference between the two Python versions is that 2.7 lets you be a bit sloppy about Unicode vs bytes, but 3.4 requires that you keep them properly separate.
Most likely, the line of input that you just reached has a non-ASCII character, and the default encoding is ASCII. (Though without the actual exception message, I can't be sure of that.) The best fix would be to know what the file's encoding is, and simply add that as a parameter to your open() call - perhaps this: with open("filename", encoding="utf-8") as f: If you use the right encoding, and the file is correctly encoded, you should have no errors. If you still have errors... welcome to data problems, life can be hard. :| ChrisA
participants (4)
-
balaji marisetti
-
Bruno Cauet
-
Chris Angelico
-
Terry Reedy