Strategy/ Advice for How to Best Attack this Problem?

Dave Angel d at
Fri Apr 3 13:59:28 CEST 2015

On 04/02/2015 07:43 PM, Saran A wrote:
> I debugged and rewrote everything. Here is the full version. Feel free to tear this apart. The homework assignment is not due until tomorrow, so I am currently also experimenting with pyinotify as well. I do have questions regarding how to make this function compatible with the ProcessEvent Class. I will create another post for this.
> What would you advise in regards to renaming the inaptly named dirlist?

Asked and answered.  You called it path at one point.  dir_name would 
also be good.  The point is that if you have a good name, you're less 
likely to have unreasonable code processing that name.  For example,

> # # # Without data to examine here, I can only guess based on this requirement's language that
> # # fixed records are in the input.  If so, here's a slight revision to the helper functions that I wrote earlier which
> # # takes the function fileinfo as a starting point and demonstrates calling a function from within a function.
> # I tested this little sample on a small set of files created with MD5 checksums.  I wrote the Python in such a way as it
> # would work with Python 2.x or 3.x (note the __future__ at the top).
> # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit
> # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions)
> # # # The only other comments I would make are about safe-file handling.
> # # #   #1:  Question: After a user has created a file that has failed (in
> # # #        processing),can the user create a file with the same name?
> # # #        If so, then you will probably want to look at some sort
> # # #        of file-naming strategy to avoid overwriting evidence of
> # # #        earlier failures.
> # # # File naming is a tricky thing.  I referenced the tempfile module [1] and the Maildir naming scheme to see two different
> # # types of solutions to the problem of choosing a unique filename.
>> ## I am assuming that all of my files are going to be specified in unicode
> ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite
> from __future__ import print_function
> import os.path
> import time
> import logging
> def initialize_logger(output_dir):

I didn't ever bother to read the body of this function, since a simple

       print(mydata, file=mylog_file)

will suffice to add data to the chosen text file.  Why is it constantly 
getting more complex, to solve a problem that was simple in the beginning?

> #Returns filename, rootdir and filesize
> def fileinfo(f):
>      filename = os.path.basename(f)
>      rootdir = os.path.dirname(f)
>      filesize = os.path.getsize(f)
>      return filename, rootdir, filesize
> #returns length of file
> def file_len(f):
>      with open(f) as f:
>          for i, l in enumerate(f):
>              pass
>              return i + 1

Always returns 1 or None.  Check the indentation, and test to see what 
it does for empty file, for a file with one line, and for a file with 
more than one line.

> #attempts to copy file and move file to it's directory
> def copy_and_move_file(src, dest):

Which is it, are you trying to copy it, or move it?  Pick one and make a 
good function name that shows your choice.

>      try:
>          os.rename(src, dest)

Why are you using rename, when you're trying to move the file?  Take a 
closer look at shutil, and see if it has a function that does it safer 
than rename.  The function you need uses rename, when it'll work, and 
does it other ways when rename will not.

>          # eg. src and dest are the same file
>      except IOError as e:
>          print('Error: %s' % e.strerror)

A try/except that doesn't try to correct the problem is not generally 
useful.  Figure out what could be triggering the exception, and how 
you're going to handle it.  If it cannot be handled, terminate the program.

For example, what if you don't have permissions to modify one of the 
specified directories?  You can't get any useful work done, so you 
should notify the user and exit.  An alternative is to produce 50 
thousand reports of each file you've got, telling how it succeeded or 
failed, over and over.

> path = "."
> dirlist = os.listdir(path)
> def main(dirlist):
>      before = dict([(f, 0) for f in dirlist])

Since dirlist is a path, it's a string.  So you're looping through the 
characters of the name of the path.  I still don't have a clue what the 
dict is supposed to mean here.

>      while True:
>          time.sleep(1) #time between update check

That loop goes forever, so the following code will never run.

>      after = dict([(f, None) for f in dirlist])

Once again, you're looping through the letters of the directory name. 
Or if dirlist is really a list, and you're deciding that's what it 
should be, then of course after will be identical to before.

>      added = [f for f in after if not f in before]
>      if added:
>          f = ''.join(added)
>          print('Sucessfully added %s file - ready to validate') %()
>          return validate_files(f)
>      else:
>          return move_to_failure_folder_and_return_error_file(f)
> def validate_files(f):

You completely changed the scope of this function.  Why does it still 
have an s in its name?

>      creation = time.ctime(os.path.getctime(f))
>      lastmod = time.ctime(os.path.getmtime(f))

What do timestamps have to do with anything?

>      if creation == lastmod and file_len(f) > 0:
>          return move_to_success_folder_and_read(f)

How have you decided that the records are all the same length?

>      if file_len < 0 and creation != lastmod:

file_len cannot be safely compared to 0, it's a function.  If this were 
Python 3, it would give you a runtime error for trying.  Even if you had 
successfully called the function here, how is it possible for it to 
return a negative value?

>          return move_to_success_folder_and_read(f)
>      else:
>          return move_to_failure_folder_and_return_error_file(f)
> #Potential Additions/Substitutions
> def move_to_failure_folder_and_return_error_file():
>      filename, rootdir, lastmod, creation, filesize = fileinfo(file)

fileinfo() returns 3 things, and you're stuffing them in 5 variables.

>      os.mkdir('Failure')

What do you do the second time through this function, when that 
directory is already existing?

>      copy_and_move_file( 'Failure')

The function takes two arguments, neither of which is likely to be that 

>      initialize_logger('rootdir/Failure')
>      logging.error("Either this file is empty or there are no lines")

> def move_to_success_folder_and_read():
>      filename, rootdir, lastmod, creation, filesize = fileinfo(file)
>      os.mkdir('Success')
>      copy_and_move_file(rootdir, 'Success') #file name
>      print("Success", file)
>      return file_len(file)
> if __name__ == '__main__':
>     main(dirlist)

So now you're passing a list of filenames to main()?  How then is it 
going to know when new files arrive?  You were originally going to tell 
it a directory name.  That's why I suggested that you call the parameter 
'path' or 'dirname'.


More information about the Python-list mailing list