[Tutor] filename comparison
mhysnm1964 at gmail.com
mhysnm1964 at gmail.com
Wed Jan 12 05:45:01 EST 2022
Peter and Alan,
Thanks again for your support here.
I have been playing with the Pseudo code Alan has provided. Got somewhere
with it. But I am going to play with the below code as well.
The end goal here is to build a database or spreadsheet with all my books
containing information from Google books API. I have got this working to a
90% level. Just don't like how google books API returns results when you
give it a filter of "full" based upon an author name or a book title. I
don't want to see near matches. Only exact matches. I will have to play with
the request get method string to get this right. I also have to work out how
to inject "intitle" "inauthor" params into the string as well. The param
function from requests doesn't seem to include it.
Sean
-----Original Message-----
From: Tutor <tutor-bounces+mhysnm1964=gmail.com at python.org> On Behalf Of
Peter Otten
Sent: Wednesday, 12 January 2022 9:25 PM
To: tutor at python.org
Subject: Re: [Tutor] filename comparison
On 11/01/2022 11:34, mhysnm1964 at gmail.com wrote:
> All,
>
>
>
> Problem Description: I have over 8000 directories. In each directory
> there is a text file and a MP3 file. Below is the file naming
> structure for an MP3 or text file:
>
>
>
> Other Mother plot.txt
>
> Other Mother.mp3
>
>
>
> What should occur:
>
> * Each directory should have both the above two files.
> * There can be multiple MP3 and text files in the same directory.
> * I want to find out which directories do not have a plot text file
> associated to the already existing mp3 file.
> * I want to find out which plot text file does not have a mp3 file.
>
>
>
> I have already managed to walk the directory structure using os.walk.
> But I am struggling with the best method of comparing the existing files.
>
>
>
> Anyone have any ideas how to approach this problem? As I am completely
> stuck on how to resolve this.
Given the filenames in one directory your task reduces to a few set
operations on the names without their respective suffix:
>>> files = ["foo.mp3", "bar.mp3", "bar plot.txt", "baz plot.txt"]
Get the "stems" from the text files:
>>> txt_only = {n[:-9] for n in files if n.endswith(" plot.txt")} >>>
txt_only {'bar', 'baz'}
The same for the audio files:
>>> mp3_only = {n[:-4] for n in files if n.endswith(".mp3")} >>> mp3_only
{'bar', 'foo'}
Try turning the above into a function stems(files, suffix) that works for
arbitrary suffixes.
Now the beauty of this approach: getting the stems of the missing text
files:
>>> missing_txt = mp3_only - txt_only
>>> missing_txt
{'foo'}
Swap the two sets to get the stems of the missing mp3 files.
Before reporting you may want to add the suffixes:
>>> {n + " plot.txt" for n in missing_txt} {'foo plot.txt'}
PS: Getting the stems of the complete pairs is just as easy:
>>> txt_only & mp3_only
{'bar'}
_______________________________________________
Tutor maillist - Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list