[Tutor] storing and saving file tree structure
mhysnm1964 at gmail.com
mhysnm1964 at gmail.com
Sun Jan 24 01:45:29 EST 2021
All,
I have a directory for all the books I own. In this directory I have
organised as follows:
D:\authors\a\Anne rice\title\filename.pdf
Title - is the title of the book and the filename could be a PDF, Mob, MP3,
ETC. depending where I have purchased the book. What I am trying to do is
bring all the information into a spreadsheet. Hence why I am trying to bring
in the whole directory structure into a data struct like a dict.
Considerations:
* Some directories have multiple files. Thus when you use pathlib you
will get multiple path objects. For example:
D:\authors\a\Anne rice\title\track1.mp3
D:\authors\a\Anne rice\title\track2.mp3
D:\authors\a\Anne rice\title\track3.mp3
* I do not want the filenames to be included, only the directory names
which I have worked out.
* The last directory is normally the title of the book. This is how I
have structured the directory.
* I want to remove duplicate entries of author names and titles.
* Want to import into Excel - I have information on this part. Either
directly into a spreadsheet or use CSV.
I think that is about it for the considerations. What I cannot work out is
how to write the code to remove duplicates. Dictionaries are really great to
identify duplicate keys because you can use a simple if test. Finding
duplicates in a list is more challenging. The structure I am using is:
Books = {"Anne Rice": []} # dict with a list.
Only methods I have found to identify duplicates within lists is using for
loops. Thus I was trying to work out how to use dictionaries instead and
could not. Creating nested dictionaries dynamically is beyond my ability.
Below is the code an I am hoping someone can give me some pointers.
import re, csv
from pathlib import Path
def csv_export (data):
# dumps the data list to a csv and has to be an list
with open ('my-books.csv', 'w', newline="") as fp:
writer = csv.writer(fp)
writer.writerows(data)
# end def
books = {}
bookPath = []
dirList = Path(r"e:\authors") # starting directory
for path in dirList.rglob('*'): # loading the whole directory structure.
if not path.is_dir(): # Checks to see if is a file.
bookPath = list(path.relative_to(dirList).parts) # extracts the file
path as a tuple without "e:\author".
bookPath.pop() # removes the file from the path parts, as we only
want directory names.
author = bookPath[1] # author name is always the 2nd element.
if author in books: # check for existing keys
if bookPath[-1] not in books[author]: # trying to find
duplicate titles but fails.
books[author].append(bookPath)
# end if
else: # creates new entries for dict.
books[author] = bookPath
# end if
# end if
# end for
I suspect I might have to do recursive functions but not sure how to do
this. I always have challenges with recursive logic. I hope someone can help
and the above makes sense.
Sean
More information about the Tutor
mailing list