[Tutor] Re: test if file is not ascii

Michael Janssen Janssen at rz.uni-frankfurt.de
Sun Nov 2 10:15:06 EST 2003


On Fri, 31 Oct 2003, Roger Merchberger wrote:

> Checking each individual char [as you noticed] is insane...
>
> ... have you tried a regular expression?
>
> Something like:
> =-=-=-=-=-=-=-=
>
> import re
>
> bb = 'this is the search string that I want to see is in it...'
> if re.search('[\x00-\x19]',bb):
>    print "Yes it's in there"
> else:
>    Print "No, it's not in there"

I've tested the range-definition, and it works. But are newlines also
invalid within rtf (or better exclude \n - \x0A and \r - \x0B)?

Note that you can make the error Message more helpful:

mt = re.search('[\x00-\x19]',bb)
if mt:
   print "illegal character (ascii-num: %s) on position %s" \
    % (ord(mt.group()), mt.start())


determining the line-number (if rtf-source has lines ;-) could be:

line_num = len( re.findall(os.linesep, bb[:mt.start()]) ) +1

Michael



More information about the Tutor mailing list