Handling text lines from files with some (few) starnge chars

Chris Rebert clp2 at rebertia.com
Sat Jun 5 19:41:55 EDT 2010


On Sat, Jun 5, 2010 at 4:03 PM, Paulo da Silva
<psdasilva.nospam at netcabonospam.pt> wrote:
> I need to read text files and process each line using string
> comparisions and regexp.
>
> I have a python2 program that uses <file object>.readline to read each
> line as a string. Then, processing it was a trivial job.
>
> With python3 I got error messagew like:
> File "./pp1.py", line 93, in RL
>    line=inf.readline()
>  File "/usr/lib64/python3.1/codecs.py", line 300, in decode
>    (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position
> 4963-4965: invalid data
>
> How do I handle this?

Specify the encoding of the text when opening the file using the
`encoding` parameter. For Windows-1252 for example:

your_file = open("path/to/file.ext", 'r', encoding='cp1252')

Cheers,
Chris
--
http://blog.rebertia.com



More information about the Python-list mailing list