[Tutor] filename comparison

Wed Jan 12 05:45:01 EST 2022

Peter and Alan,

Thanks again for your support here. 

I have been playing with the  Pseudo code Alan has provided. Got somewhere
with it. But I am going to play with the below code as well.

The end goal here is to build a database or spreadsheet with all my books
containing information from Google books API. I have got this working to a
90% level. Just don't like how google books API returns results when you
give it a filter of "full" based upon an author name or a book title. I
don't want to see near matches. Only exact matches. I will have to play with
the request get method string to get this right. I also have to work out how
to inject "intitle" "inauthor" params into the string as well. The param
function from requests doesn't seem to include it.

Sean 
-----Original Message-----
From: Tutor <tutor-bounces+mhysnm1964=gmail.com at python.org> On Behalf Of
Peter Otten
Sent: Wednesday, 12 January 2022 9:25 PM
To: tutor at python.org
Subject: Re: [Tutor] filename comparison

On 11/01/2022 11:34, mhysnm1964 at gmail.com wrote:
> All,
>
>
>
> Problem Description: I have over 8000 directories. In each directory 
> there is a text file and a MP3 file. Below is the file naming 
> structure for an MP3 or text file:
>
>
>
> Other Mother plot.txt
>
> Other Mother.mp3
>
>
>
> What should occur:
>
> *	Each directory should have both the above two files.
> *	There can be multiple MP3 and text files in the same directory.
> *	I want to find out which directories do not have a  plot text file
> associated to the already existing mp3 file.
> *	I want to find out which plot text file does not have a mp3 file.
>
>
>
> I have already managed to walk the directory structure using os.walk. 
> But I am struggling with the best method of comparing the existing files.
>
>
>
> Anyone have any ideas how to approach this problem? As I am completely 
> stuck on how to resolve this.

Given the filenames in one directory your task reduces to a few set
operations on the names without their respective suffix:

 >>> files = ["foo.mp3", "bar.mp3", "bar plot.txt", "baz plot.txt"]

Get the "stems" from the text files:

 >>> txt_only = {n[:-9] for n in files if n.endswith(" plot.txt")}  >>>
txt_only {'bar', 'baz'}

The same for the audio files:

 >>> mp3_only = {n[:-4] for n in files if n.endswith(".mp3")}  >>> mp3_only
{'bar', 'foo'}

Try turning the above into a function stems(files, suffix) that works for
arbitrary suffixes.

Now the beauty of this approach: getting the stems of the missing text
files:

 >>> missing_txt = mp3_only - txt_only
 >>> missing_txt
{'foo'}

Swap the two sets to get the stems of the missing mp3 files.
Before reporting you may want to add the suffixes:

 >>> {n + " plot.txt" for n in missing_txt} {'foo plot.txt'}

PS: Getting the stems of the complete pairs is just as easy:
 >>> txt_only & mp3_only
{'bar'}
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor