[Tutor] storing and saving file tree structure

Mon Jan 25 04:20:40 EST 2021

Allan,

As indents are a visual formatting structure. It is difficult for a screen
reader user like myself to keep the blocks of code correctly indented. Thus
why I am using the comments at the end of the code.

I will have to recheck my logic for the dict's, as I got errors in the past.
I will check out sets as that sounds useful. 

I have found some other code using os.walk which I am checking out.

Yes, I could use a database to store all this information which I might do
later. Learning the language and creation of datasets with a goal in mind to
achieve what I need for a short term solution.

Thanks for the tips. I will go and have a investigation.

Sean 

-----Original Message-----
From: Tutor <tutor-bounces+mhysnm1964=gmail.com at python.org> On Behalf Of
Alan Gauld via Tutor
Sent: Sunday, 24 January 2021 8:28 PM
To: tutor at python.org
Subject: Re: [Tutor] storing and saving file tree structure

On 24/01/2021 06:45, mhysnm1964 at gmail.com wrote:

> D:\authors\a\Anne rice\title\filename.pdf
> 
> Title - is the title of the book and the filename could be a PDF, Mob, 
> MP3, ETC. depending where I have purchased the book. What I am trying 
> to do is bring all the information into a spreadsheet. Hence why I am 
> trying to bring in the whole directory structure into a data struct like a
dict.

You could use a database instead.
SQLite comes with python and can be run in memory rather than on disk.

> *	Some directories have multiple files. Thus when you use pathlib you
> will get multiple path objects. For example:
> D:\authors\a\Anne rice\title\track1.mp3 D:\authors\a\Anne 
> rice\title\track2.mp3 D:\authors\a\Anne rice\title\track3.mp3
> 
> *	I do not want the filenames to be included, only the directory names
> which I have worked out.
> *	The last directory is normally the title of the book. This is how I
> have structured the directory.
> *	I want to remove duplicate entries of author names and titles.
> *	Want to import into Excel - I have information on this part. Either
> directly into a spreadsheet or use CSV.

Rewording the requirement.

You have a set of authors and each author has a set of books associated.
Is that it?

> how to write the code to remove duplicates. Dictionaries are really 
> great to identify duplicate keys because you can use a simple if test. 
> Finding duplicates in a list is more challenging.

So don't use a list. use a set. sets remove duplicates automatically (ie
they don't allow them to exist!)

> Books = {"Anne Rice": []} # dict with a list.

Books = {"Anne Rice": set()} # dict with a set

> Only methods I have found to identify duplicates within lists is using 
> for loops. Thus I was trying to work out how to use dictionaries 
> instead and could not. Creating nested dictionaries dynamically is beyond
my ability.

Its really not difficult. Lets pretent you wanted all the files associated
woth each book:

Books = {"Anne rice": {"Book Title": [list,of,files]}}

Personally I use formatting to show the layout better if I'm building it
statically, but in your case you are loading it dynamically from your files.

Books = {
         "Anne rice": {
                       "Book1": [
                                list,
                                of,
                                files
                                ],
                       "Book2": [
                                More,
                                Files
                                ]
                       },
         "Next author": {
                         etc...
                 }
        }

But since you don;t need that just use a set instead of a list.

> import  re, csv
> 
> from pathlib import Path

> def csv_export (data):
>     # dumps the data list to a csv and has to be an list

You can write a dict to a CSV and the dict keys become the column headings.
Look at the Dictwriter.

>     with open ('my-books.csv', 'w', newline="") as fp:
>         writer = csv.writer(fp)
>         writer.writerows(data)
> # end def
> 
> books = {}
> bookPath = []
> dirList = Path(r"e:\authors") # starting directory
> 
> for path in dirList.rglob('*'): # loading the whole directory structure.
> 
>     if not path.is_dir(): # Checks to see if is a file.
> 
>         bookPath = list(path.relative_to(dirList).parts) # extracts 
> the file path as a tuple without "e:\author".
> 
>         bookPath.pop() # removes the file from the path parts, as we 
> only want directory names.
> 
>        author = bookPath[1] # author name is always the 2nd element.
> 
>         if author in books: # check for existing keys

Its a dictionary why do you care? Just add the books to the author entry, if
it exists it will work, if it doesn't the entry will be created.

> 
>             if  bookPath[-1] not in books[author]: # trying to find 
> duplicate titles but fails.

If you use a set you don;t need to check. but...
In what way fails? It should succeed with an in test even if its not very
efficient.
>                 books[author].append(bookPath)
>             # end if 
>         else: # creates new entries for dict.
>             books[author] = bookPath
>         # end if 
>     # end if
> # end for

One of the reasons Python uses indentation is to avoid all these end markers
and their misleading, and thus bug-forming, implications.
It's rather ironic that you are putting them back in as comments! :)

> I suspect I might have to do recursive functions but not sure how to 
> do this. I always have challenges with recursive logic. I hope someone 
> can help and the above makes sense.

It looks like it should work, although you only check the books if the
author is already there. Where is the code to handle the case where its a
new author?

But if you use a dict of sets you avoids all of that checking business.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor