Finding non ascii characters in a set of files

John Machin sjmachin at lexicon.net
Fri Feb 23 10:49:42 EST 2007


On Feb 24, 2:35 am, "John Machin" <sjmac... at lexicon.net> wrote:
> On Feb 24, 2:12 am, "Peter Bengtsson" <pete... at gmail.com> wrote:
>
> > On Feb 23, 2:38 pm, b... at yahoo.com wrote:
>
> > > Hi,
>
> > > I'm updating my program to Python 2.5, but I keep running into
> > > encoding problems. I have no ecodings defined at the start of any of
> > > my scripts. What I'd like to do is scan a directory and list all the
> > > files in it that contain a non ascii character. How would I go about
> > > doing this?
>
> > How about something like this:
> > content = open('file.py').read()
> > try:
> >     content.encode('ascii')
> > except UnicodeDecodeError:
> >     print "file.py contains non-ascii characters"

Sorry, I fell face down on the Send button :-)

To check all .py files in the current directory, modify Peter's code
like this:

import glob
for filename in glob.glob('*.py'):
   content = open(filename).read()

maybe that UnicodeDecodeError should be ...Encode...
and change the print statement to cater for filename being variable.

If you have hundreds of .py files in the same directory, you'd better
modify the code further to explicitly close each file.

HTH,
John




More information about the Python-list mailing list