Readlines returns non ASCII character
ian.g.kelly at gmail.com
Wed Sep 23 23:49:57 CEST 2015
On Wed, Sep 23, 2015 at 3:02 PM, SANKAR . <shankarphy at gmail.com> wrote:
> Thanks Ian,
> this isn't a text file, but when I read with readline I get the data I need
> along with mojibake. UTF 32 returns following error:
> Traceback (most recent call last):
> File "D:\RV\RV1.py", line 17, in <module>
> linenumx1 = file.readlines()
> File "C:\Python27\lib\codecs.py", line 682, in readlines
> return self.reader.readlines(sizehint)
> File "C:\Python27\lib\codecs.py", line 591, in readlines
> data = self.read()
> File "C:\Python27\lib\codecs.py", line 480, in read
> newchars, decodedbytes = self.decode(data, self.errors)
> File "C:\Python27\lib\encodings\utf_32.py", line 130, in decode
> codecs.utf_32_ex_decode(input, errors, 0, False)
> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: code
> point not in range(0x110000)
1) Open the file in binary mode using the open function, not using codecs.open.
2) Find out or figure out the file format.
3) Read the file and extract the particular fields that you're
interested in from the file as bytes objects.
4) Decode those bytes objects and only those using UTF-32.
More information about the Python-list