[Tutor] How to list/process files with identical character strings

Peter Otten __peter__ at web.de
Tue Jun 24 22:02:53 CEST 2014


mark murphy wrote:

> Hello Python Tutor Community,
> 
> This is my first post and I am just getting started with Python, so I
> apologize in advance for any lack of etiquette.
> 
> I have a directory of several thousand daily satellite images that I need
> to process.  Approximately 300 of these images are split in half, so in
> just these instances there will be two files for one day.  I need to merge
> each pair of split images into one image.
> 
> The naming convention of the files is as follows: TYYYYDDDHHMMSS, where:
> T= one character satellite code
> YYYY = 4 digit year
> DDD = Julian date
> HH = 2-digit hour
> MM = 2-digit minute
> SS = 2-digit second
> 
> What I hope to be able to do is scan the directory, and for each instance
> where there are two files where the first 8 characters (TYYYYDDD) are
> identical, run a process on those two files and place the output (named
> TYYYYDDD) in a new directory.
> 
> The actual processing part should be easy enough for me to figure out. 
> The part about finding the split files (each pair of files with the same
> first
> 8 characters) and setting those up to be processed is way beyond me.  I've
> done several searches for examples and have not been able to find what I
> am looking for.

Sorting is probably the approach that is easiest to understand, but an 
alternative would be to put the files into a dict that maps the 8-char 
prefix to a list of files with that prefix:

directory = "/some/directory"
files = os.listdir(directory)
days = {}
for filename in files:
    prefix = filename[:8]
    filepath = os.path.join(directory, filename)
    if prefix in days:
        # add file to the existing list
        days[prefix].append(filepath)
    else:
        # add a new list with one file
        days[prefix] = [filepath]

for fileset in days.values():
    if len(fileset) > 1:
        # process only the list with one or more files
        print("merging", fileset)

(The

    if prefix in days:
        days[prefix].append(filepath)
    else:
        days[prefix] = [filepath]

part can be simplified with the dict.setdefault() method or a 
collections.defaultdict)




More information about the Tutor mailing list