[Tutor] script guidelines
Alan Gauld
alan.gauld at yahoo.co.uk
Tue Oct 3 14:17:56 EDT 2017
On 03/10/17 09:48, renukesh nk wrote:
> requirement:
> i have a directory , that contains multiple sub directories, each sub
> directory has multiple text and log files, my script fetches the required
> lines from all the sub directories and stores it in one text file.
>
I'm not too sure what you mean by writing the "required lines"
but I've added some general comments below
> but i want it to store separate text file for each sub directory ,after
> fetching the contents. can anyone please help me where to edit my script.
It is better if you copy the script into the body of the email,
attachments often get stripped by the server so we can't see
them. In this case you got lucky...
Comments below:
> import zipfile,fnmatch,os
> import os, shutil, re, glob
you import os twice, its not a bug but it is a
potential future maintenance gotcha.
> from os.path import isfile, join
you use os.path.xxx so don't need this import
> from os import walk
You use os.walk so don't need this import.
> root_src_dir = r'E:\\New folder'
> root_dst_dir = "E:\\destination"
dst_dir should be a raw string too.
And why double quotes here and single quotes
above? It's good to be consistent.
> for src_dir, dirs, files in os.walk(root_src_dir):
> dst_dir = src_dir.replace(root_src_dir, root_dst_dir, 1)
> if not os.path.exists(dst_dir):
> os.makedirs(dst_dir)
> for file_ in files:
> src_file = os.path.join(src_dir, file_)
> dst_file = os.path.join(dst_dir, file_)
> if os.path.exists(dst_file):
> os.remove(dst_file)
> shutil.copy(src_file, dst_dir)
This just copies everything so why not use
the shutil.copytree() function instead?
> rootPath = r"E:\\destination"
Rather than duplicate the string use root_dst_dir
> pattern = '*.zip'
> for root, dirs, files in os.walk(rootPath):
This is where you do stuff at a per directory level so
this is probably where you want to create your logfile
and write to it. But i'm still not sure what you are
looking to write...
> for filename in fnmatch.filter(files, pattern):
> print(os.path.join(root, filename))
> zipfile.ZipFile(os.path.join(root,
> filename)).extractall(os.path.join(root,
> os.path.splitext(filename)[0]))
This chunk looks at every zip file and extracts the contents into the
same folder(what if there are duplicate filenames?) But you don't do
anything with those files?
> os.chdir(rootPath)
> for file in glob.glob(pattern):
> f = open((file.rsplit(".", 1)[0]) + "ANA.txt", "w")
>
> f.close()
Then you change to the destination root and iterate over any zip files
at the root level, opening a new file and immediately closing it again?
At this point I'm confused?
It would be good to put a comment before each block
explaining the intended purpose of the code - what
it is expected to achieve and why you are doing it.
> ##here folder output
> mypath =r"E:\\destination"
Again this could use root_dst_dir to avoid string
duplication (and potential error)
> newpath = os.path.expanduser("E:\\destination")
expanduser() replaces an initial tilde(~) with the homedir but
since you don't have an initial tilde its completely
redundant here.
> filenam = "1.txt"
> f = []
> path = ''
> path1 = []
These look suspiciously like bad variable names.
What is their purpose? Is 'path1' really a list?
Usually lists suggest a plural name, such as paths.
> for (dirpath, dirnames, filenames) in walk(mypath):
> if isfile(join(mypath, dirpath)):
This should never be true since mypath is a directory
and dirpath is, by definition, the folder that walk()
is currently reading. But joining them makes no sense
since dirpath should already include mypath.
> path1.extend(join(mypath, dirpath))
> if os.path.isdir(join(mypath, dirpath)):
Likewise because you join() them the result
will probably not exist so the tst will fail.
> for f in filenames:
> path1.append(str(join(mypath, dirpath, f)))
If this does get called it will probably have
broken paths since you use join again.
If you just want the files use the list of files that
os.walk gave you - it has already separated the contents
into dirs and files as part of its return value.
> print(path1)
>
> newf = open(os.path.join(newpath, filenam ), "w+")
Why use 'w+'? You only need that if you are going to be
reading and writing the file - usually involving use
of tell()/seek() etc. You don't do any of that so
keep things simple and just use 'w'.
> myarray = {"ERROR", "error"}
Its not an array its a set
> for element in myarray:
> elementstring = ''.join(element)
This is a very complex way of writing
elementstring = 'Error'
Although it does have a frisson of interest because the
actual set element chosen is theoretically arbitrary
since sets are unordered. It's not clear what you are
trying to achieve but I suspect this is not it.
> for f in path1:
> openfile = open(os.path.join(path, f), "r")
Since path is an empty string this is effectively saying
openfile = open(f, "r")
> for line in openfile:
> if elementstring in line:
> newf.write(line)
> newf.close()
> openfile.close()
You should be closing openfile in the loop above where you open it.
But better still you should be using the
with open((f,'r') as openfile:
idiom which will auto close the file for you.
I'm still not clear on what you are trying to do(*)
but it looks like you are making it much more
difficult than I suspect it needs to be.
It will also be pretty slow since you walk the directory
tree multiple times when one copytree() followed by one
walk() should, I think, suffice.
(*)I think it's to copy the initial file structure
then search all files in the copy for lines including
the strings 'Error' or 'error' and write those lines to
a logfile, one per folder. But I'm not really certain.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list