[Tutor] Re: test if file is not ascii
Michael Janssen
Janssen at rz.uni-frankfurt.de
Sun Nov 2 10:15:06 EST 2003
On Fri, 31 Oct 2003, Roger Merchberger wrote:
> Checking each individual char [as you noticed] is insane...
>
> ... have you tried a regular expression?
>
> Something like:
> =-=-=-=-=-=-=-=
>
> import re
>
> bb = 'this is the search string that I want to see is in it...'
> if re.search('[\x00-\x19]',bb):
> print "Yes it's in there"
> else:
> Print "No, it's not in there"
I've tested the range-definition, and it works. But are newlines also
invalid within rtf (or better exclude \n - \x0A and \r - \x0B)?
Note that you can make the error Message more helpful:
mt = re.search('[\x00-\x19]',bb)
if mt:
print "illegal character (ascii-num: %s) on position %s" \
% (ord(mt.group()), mt.start())
determining the line-number (if rtf-source has lines ;-) could be:
line_num = len( re.findall(os.linesep, bb[:mt.start()]) ) +1
Michael
More information about the Tutor
mailing list