[Tutor] storing and saving file tree structure

Sun Jan 24 01:45:29 EST 2021

All,

I have a directory for all the books I own. In this directory I have
organised as follows:

D:\authors\a\Anne rice\title\filename.pdf

Title - is the title of the book and the filename could be a PDF, Mob, MP3,
ETC. depending where I have purchased the book. What I am trying to do is
bring all the information into a spreadsheet. Hence why I am trying to bring
in the whole directory structure into a data struct like a dict.

Considerations:

*	Some directories have multiple files. Thus when you use pathlib you
will get multiple path objects. For example:
D:\authors\a\Anne rice\title\track1.mp3
D:\authors\a\Anne rice\title\track2.mp3
D:\authors\a\Anne rice\title\track3.mp3

*	I do not want the filenames to be included, only the directory names
which I have worked out.
*	The last directory is normally the title of the book. This is how I
have structured the directory.
*	I want to remove duplicate entries of author names and titles.
*	Want to import into Excel - I have information on this part. Either
directly into a spreadsheet or use CSV.

I think that is about it for the considerations. What I cannot work out is
how to write the code to remove duplicates. Dictionaries are really great to
identify duplicate keys because you can use a simple if test. Finding
duplicates in a list is more challenging. The structure I am using is:

Books = {"Anne Rice": []} # dict with a list.

Only methods I have found to identify duplicates within lists is using for
loops. Thus I was trying to work out how to use dictionaries instead and
could not. Creating nested dictionaries dynamically is beyond my ability.

Below is the code an I am hoping someone can give me some pointers.

import  re, csv

from pathlib import Path

def csv_export (data):

    # dumps the data list to a csv and has to be an list

    with open ('my-books.csv', 'w', newline="") as fp:

        writer = csv.writer(fp)

        writer.writerows(data)

# end def  

books = {}

bookPath = []

dirList = Path(r"e:\authors") # starting directory

for path in dirList.rglob('*'): # loading the whole directory structure.

    if not path.is_dir(): # Checks to see if is a file.

        bookPath = list(path.relative_to(dirList).parts) # extracts the file
path as a tuple without "e:\author".

        bookPath.pop() # removes the file from the path parts, as we only
want directory names.

       author = bookPath[1] # author name is always the 2nd element.

        if author in books: # check for existing keys

            if  bookPath[-1] not in books[author]: # trying to find
duplicate titles but fails. 

                books[author].append(bookPath)

            # end if 

        else: # creates new entries for dict.

            books[author] = bookPath

        # end if 

    # end if 

# end for

I suspect I might have to do recursive functions but not sure how to do
this. I always have challenges with recursive logic. I hope someone can help
and the above makes sense.

Sean