[Tutor] file.read..... Abort Problem

Thu Oct 20 21:33:36 CEST 2005

Tomas Markus wrote:
> Hello Pythoners,
> 
> This is probably a very newbie question (after all I am one):
> 
> I am trying to read a file with some 2500 lines and to do a scan for 
> "not allowed" characters in there (such as accented ones and so on). The 
> problem is that the file I am testing with has got one of those 
> somewhere on line 2100 (the hex value of the character is 1a) and all of 
> the file.read functions (read, readline, readlines) actually stop 
> reading the file exactly at that line as if it was interpreted as an 
> EOF. Can you, please, help?

Are you on windows? Try opening the file in binary mode:

 >>> d='abc\x1adef\nghij\n'
 >>> d
'abc\x1adef\nghij\n'
 >>> open('(temp)/test.txt', 'w').write(d)
 >>> d1=open('(temp)/test.txt').read()
 >>> d1
'abc'
 >>> d1=open('(temp)/test.txt', 'rb').read()
 >>> d1
'abc\x1adef\r\nghij\r\n'

> Btw: When I am past this problem I will be 
> asking yet another question: what is the most effective way to check a 
> file for not allowed characters or how to check it for allowed only 
> characters (which might be i.e. ASCII only).

You can read the file with read() and use a regex to search for unallowed characters.

Kent
> 
> Many, many thanks.
> 
> Tom
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor