[Tutor] filename comparison

dn PyTutor at DancesWithMice.info
Tue Jan 11 06:08:44 EST 2022


On 11/01/2022 23.34, mhysnm1964 at gmail.com wrote:
> Problem Description: I have over 8000 directories. In each directory there
> is a text file and a MP3 file. Below is the file naming structure for an MP3
> or text file:
> 
> Other Mother plot.txt
> Other Mother.mp3
> 
> What should occur:
> 
> *	Each directory should have both the above two files. 
> *	There can be multiple MP3 and text files in the same directory.
> *	I want to find out which directories do not have a  plot text file
> associated to the already existing mp3 file.
> *	I want to find out which plot text file does not have a mp3 file.
> 
> I have already managed to walk the directory structure using os.walk. But I
> am struggling with the best method of comparing the existing files.
> 
> Anyone have any ideas how to approach this problem? As I am completely stuck
> on how to resolve this.


Please review the (newer) pathlib library
(https://docs.python.org/3/library/pathlib.html) and/or the (older)
os.path library. Both enable manipulation of path and filenames - many
of us code from memory using the latter, but these days it's probably
better to start by learning the newer library!

These will enable taking each filename and separating it into filename
and extension/file-type (.txt or .mp3). This will facilitate checking
that the files exist in appropriate pairs, by type.

There are functions to see if one string startswith() or endswith() a
sub-string
(https://docs.python.org/3/library/stdtypes.html?text-sequence-type-str). These
will facilitate relating "Other Mother plot" to "Other Mother".

Recommendation:
ignore the (already solved) directory structure side of things, and
start by working with only a single directory containing the various
permutations of multiple correct-pairs, and incorrect singletons.

1 How to decide that the filenames correspond?
2 how to check that the pair includes one .txt and one .mp3.
3 consider the error-conditions of non-matching pairs, and
4 'orphan files'.

NB if the above is built as a function, then it should be an easy task
to (once complete) fit it 'inside' the directory-walk already coded...

If you'd like to ask a further question, please come back to us with the
code you have so-far (copy-paste).
-- 
Regards,
=dn


More information about the Tutor mailing list