[Tutor] Parsing data from a set of files iteratively

Sat May 19 04:35:44 CEST 2012

Spyros Charonis wrote:
> Dear Python community,
> 
> I have a set of ~500 files which I would like to run a script on. My script
> extracts certain information and
> generates several lists with items I need. For one of these lists, I need
> to combine the information from all
> 500 files into one super-list. Is there a way in which I can iteratively
> execute my script over all 500 files
> and get them to write the list I need into a new file? Many thanks in
> advance for your time.

Naturally; one way is to explicitly iterate over the names of the files:

for filename in ('a.txt', 'b.txt', 'c.txt'):  # and 497 more...
     do something with filename...

Of course, writing 500 file names in your script is probably going to be quite 
painful. What you can do is list the file names in an external file, one per 
line, then use that:

names = open('list of names.txt').readlines()
for filename in [name.strip() for name in names]:
     ...

Note the use of a list comprehension (the bit inside the square brackets) to 
strip away whitespace from the names, including the newline that occurs after 
every line.

Another approach is to make sure all the files are in a single directory, then 
walk the directory:

import os
for filename in os.listdir('where the files are'):
     ...

If the files are in subdirectories, you can use the os.walk() function. See 
the documentation for details of how to use it:

http://docs.python.org/library/os.html#os.listdir
http://docs.python.org/library/os.html#os.walk

(Aside: if you think os.walk is complicated, you should try using its 
predecessor, os.path.walk!)

A fourth approach is to use the fileinput module, which takes a list of files, 
then treats them all as one giant file. Beware though, fileinput is not very 
efficient and may struggle a little with 500 files.

http://docs.python.org/library/fileinput.html

-- 
Steven