On 2/11/2012 12:00 PM, Masklinn wrote:
On 2012-02-11, at 17:44 , Terry Reedy wrote:
On 2/11/2012 5:47 AM, Paul Moore wrote:
I have a text file, in an unknown encoding (yes, it does happen to me!) but opening in an editor shows it's mainly-ASCII. I want to find all the lines starting with a '*'. The simple
with open('myfile.txt') as f: for line in f: if line.startswith('*'): print(line)
fails with encoding errors. What do I do?
Good example. I believe adding ", encoding='latin-1'" to open() is sufficient.
Why not open the file in binary mode in stead? (and replace `'*'` by `b'*'` in the startswith call)
When I wrote that response, I thought that 'for line in f' would not work for binary-mode files. I then opened IDLE, experimented with 'rb', and discovered otherwise. So the remaining issue is how one wants the unknown encoding bytes to appear when printed -- as hex escapes, or as arbitrary but more readable non-ascii latin-1 chars. -- Terry Jan Reedy