Finding non ascii characters in a set of files
Larry Bates
lbates at websafe.com
Fri Feb 23 10:44:40 EST 2007
Peter Bengtsson wrote:
> On Feb 23, 2:38 pm, b... at yahoo.com wrote:
>> Hi,
>>
>> I'm updating my program to Python 2.5, but I keep running into
>> encoding problems. I have no ecodings defined at the start of any of
>> my scripts. What I'd like to do is scan a directory and list all the
>> files in it that contain a non ascii character. How would I go about
>> doing this?
>>
>
> How about something like this:
> content = open('file.py').read()
> try:
> content.encode('ascii')
> except UnicodeDecodeError:
> print "file.py contains non-ascii characters"
>
>
The next problem will be that non-text files will contain non-ASCII
characters (bytes). The other 'issue' is that OP didn't say how large
the files were, so .read() might be a problem.
-Larry
More information about the Python-list
mailing list