[Tutor] Advice on Strategy for Attacking IO Program

Mon Mar 30 04:25:30 CEST 2015

Greetings,

> Here is what I am trying have my program do:
>
> • Monitor a folder for files that are dropped throughout the day
>
> • When a file is dropped in the folder the program should scan the file
>
> o IF all the contents in the file have the same length
>
> o THEN the file should be moved to a "success" folder and a text file
> written indicating the total number of records processed
>
> o IF the file is empty OR the contents are not all of the same length
>
> o THEN the file should be moved to a "failure" folder and a text file
> written indicating the cause for failure (for example: Empty file or line
> 100 was not the same length as the rest).

This is a very good description of the intent of the programmer. 
Thanks for including this.

> Below are the functions that I have been experimenting with. I am 
> not sure how to most efficiently create a functional program

Do you mean 'functional programming' [0] style as in Scheme, Haskell 
and/or ML?  Or do you simply mean, 'functioning program'?

> from each of these constituent parts. I could use decorators 
> (sacrificing speed) or simply pass a function within another 
> function.

Given what you describe, I think your concern about sacrificing 
speed is worth forgetting (for now).  Unless we are talking 
thousands of files per minute, I think you can sleep comfortably 
that decorators (which are, operationally, nothing more than another 
function call) are not going to slow down your program terribly.

If you are (ever) worried about speed, then profile and understand 
where to place your effort.  (I belong to the camp that searches for 
correctness first and then performance later.)

Are you aware that you can define functions yourself?  And, that you 
could have your function call one of the below functions?  I will 
demonstrate by changing your 'fileinfo' function slightly.

I snipped the rest of your code.

> def fileinfo(file):
>    filename = os.path.basename(file)
>    rootdir = os.path.dirname(file)
>    lastmod = time.ctime(os.path.getmtime(file))
>    creation = time.ctime(os.path.getctime(file))
>    filesize = os.path.getsize(file)
>
>    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation,
> filesize)

Do you realize you can actually return the values to the caller 
rather than print them?  This is an excelent way to pass information 
around a program.

Without data to examine here, I can only guess based on your 
language that fixed records are in the input.  If so, here's a 
slight revision to your program, which takes the function fileinfo 
as a starting point and demonstrates calling a function from within 
a function.  I tested this little sample on a small set of files 
created with MD5 checksums.  I wrote the Python in such a way as it 
would work with Python 2.x or 3.x (note the __future__ at the top).

There are so many wonderful ways of failure, so you will probably 
spend a bit more time trying to determine which failure(s) you want 
to report to your user, and how.

The only other comments I would make are about safe-file handling.

   #1:  After a user has created a file that has failed (in
        processing),can the user create a file with the same name?
        If so, then you will probably want to look at some sort
        of file-naming strategy to avoid overwriting evidence of
        earlier failures.

File naming is a tricky thing.  See the tempfile module [1] and the 
Maildir naming scheme to see two different types of solutions to the 
problem of choosing a unique filename.

Good luck with your foray,

-Martin

Code snippet:

#! /usr/bin/python

from __future__ import print_function

import os
import time

RECORD_LENGTH = 32

def process_data(f, filesize):
     f.seek(0, os.SEEK_SET)
     counter = 0

     # -- are the records text?  I assume yes, though your data may differ.
     #    if your definition of a record includes the newline, then
     #    you will want to use len(record) ...
     #
     for record in f:
         print("record: %s" % ( record.strip()), file=sys.stderr)
         if RECORD_LENGTH == len(record.strip()):  # -- len(record)
             counter += 1
         else:
             return False, counter
     return True, counter

def validate_and_process_data(f, filesize):
     f.seek(0, os.SEEK_END)
     lastbyte = f.tell()
     if lastbyte != filesize:
         return False, 0
     else:
         return process_data(f, filesize)

def validate_files(files):
     for file in files:
         filename, rootdir, lastmod, creation, filesize = fileinfo(file)
         f = open(filename)
         result, record_count = validate_and_process_data(f, filesize)
         if result:
             # -- Success!  Write text file with count of records
             #    recognized in record_count
             print('success with %s on %d records' % (filename, record_count,), file=sys.stderr)
             pass
         else:
             # -- Failure; need additional logic here to determine
             #    the cause for failure.
             print('failure with %s after %d records' % (filename, record_count,), file=sys.stderr)
             pass

def fileinfo(file):
     filename = os.path.basename(file)
     rootdir = os.path.dirname(file)
     lastmod = time.ctime(os.path.getmtime(file))
     creation = time.ctime(os.path.getctime(file))
     filesize = os.path.getsize(file)
     return filename, rootdir, lastmod, creation, filesize

if __name__ == '__main__':
    import sys
    validate_files(sys.argv[1:])

# -- end of file

  [0] http://en.wikipedia.org/wiki/Functional_programming
  [1] https://docs.python.org/2/library/tempfile.html#module-tempfile

-- 
Martin A. Brown
http://linux-ip.net/