[Tutor] How to list/process files with identical character strings
Peter Otten
__peter__ at web.de
Tue Jun 24 22:02:53 CEST 2014
mark murphy wrote:
> Hello Python Tutor Community,
>
> This is my first post and I am just getting started with Python, so I
> apologize in advance for any lack of etiquette.
>
> I have a directory of several thousand daily satellite images that I need
> to process. Approximately 300 of these images are split in half, so in
> just these instances there will be two files for one day. I need to merge
> each pair of split images into one image.
>
> The naming convention of the files is as follows: TYYYYDDDHHMMSS, where:
> T= one character satellite code
> YYYY = 4 digit year
> DDD = Julian date
> HH = 2-digit hour
> MM = 2-digit minute
> SS = 2-digit second
>
> What I hope to be able to do is scan the directory, and for each instance
> where there are two files where the first 8 characters (TYYYYDDD) are
> identical, run a process on those two files and place the output (named
> TYYYYDDD) in a new directory.
>
> The actual processing part should be easy enough for me to figure out.
> The part about finding the split files (each pair of files with the same
> first
> 8 characters) and setting those up to be processed is way beyond me. I've
> done several searches for examples and have not been able to find what I
> am looking for.
Sorting is probably the approach that is easiest to understand, but an
alternative would be to put the files into a dict that maps the 8-char
prefix to a list of files with that prefix:
directory = "/some/directory"
files = os.listdir(directory)
days = {}
for filename in files:
prefix = filename[:8]
filepath = os.path.join(directory, filename)
if prefix in days:
# add file to the existing list
days[prefix].append(filepath)
else:
# add a new list with one file
days[prefix] = [filepath]
for fileset in days.values():
if len(fileset) > 1:
# process only the list with one or more files
print("merging", fileset)
(The
if prefix in days:
days[prefix].append(filepath)
else:
days[prefix] = [filepath]
part can be simplified with the dict.setdefault() method or a
collections.defaultdict)
More information about the Tutor
mailing list