[Tutor] Advice on Strategy for Attacking IO Program
Martin A. Brown
martin at linux-ip.net
Mon Mar 30 04:25:30 CEST 2015
Greetings,
> Here is what I am trying have my program do:
>
> • Monitor a folder for files that are dropped throughout the day
>
> • When a file is dropped in the folder the program should scan the file
>
> o IF all the contents in the file have the same length
>
> o THEN the file should be moved to a "success" folder and a text file
> written indicating the total number of records processed
>
> o IF the file is empty OR the contents are not all of the same length
>
> o THEN the file should be moved to a "failure" folder and a text file
> written indicating the cause for failure (for example: Empty file or line
> 100 was not the same length as the rest).
This is a very good description of the intent of the programmer.
Thanks for including this.
> Below are the functions that I have been experimenting with. I am
> not sure how to most efficiently create a functional program
Do you mean 'functional programming' [0] style as in Scheme, Haskell
and/or ML? Or do you simply mean, 'functioning program'?
> from each of these constituent parts. I could use decorators
> (sacrificing speed) or simply pass a function within another
> function.
Given what you describe, I think your concern about sacrificing
speed is worth forgetting (for now). Unless we are talking
thousands of files per minute, I think you can sleep comfortably
that decorators (which are, operationally, nothing more than another
function call) are not going to slow down your program terribly.
If you are (ever) worried about speed, then profile and understand
where to place your effort. (I belong to the camp that searches for
correctness first and then performance later.)
Are you aware that you can define functions yourself? And, that you
could have your function call one of the below functions? I will
demonstrate by changing your 'fileinfo' function slightly.
I snipped the rest of your code.
> def fileinfo(file):
> filename = os.path.basename(file)
> rootdir = os.path.dirname(file)
> lastmod = time.ctime(os.path.getmtime(file))
> creation = time.ctime(os.path.getctime(file))
> filesize = os.path.getsize(file)
>
> print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation,
> filesize)
Do you realize you can actually return the values to the caller
rather than print them? This is an excelent way to pass information
around a program.
Without data to examine here, I can only guess based on your
language that fixed records are in the input. If so, here's a
slight revision to your program, which takes the function fileinfo
as a starting point and demonstrates calling a function from within
a function. I tested this little sample on a small set of files
created with MD5 checksums. I wrote the Python in such a way as it
would work with Python 2.x or 3.x (note the __future__ at the top).
There are so many wonderful ways of failure, so you will probably
spend a bit more time trying to determine which failure(s) you want
to report to your user, and how.
The only other comments I would make are about safe-file handling.
#1: After a user has created a file that has failed (in
processing),can the user create a file with the same name?
If so, then you will probably want to look at some sort
of file-naming strategy to avoid overwriting evidence of
earlier failures.
File naming is a tricky thing. See the tempfile module [1] and the
Maildir naming scheme to see two different types of solutions to the
problem of choosing a unique filename.
Good luck with your foray,
-Martin
Code snippet:
#! /usr/bin/python
from __future__ import print_function
import os
import time
RECORD_LENGTH = 32
def process_data(f, filesize):
f.seek(0, os.SEEK_SET)
counter = 0
# -- are the records text? I assume yes, though your data may differ.
# if your definition of a record includes the newline, then
# you will want to use len(record) ...
#
for record in f:
print("record: %s" % ( record.strip()), file=sys.stderr)
if RECORD_LENGTH == len(record.strip()): # -- len(record)
counter += 1
else:
return False, counter
return True, counter
def validate_and_process_data(f, filesize):
f.seek(0, os.SEEK_END)
lastbyte = f.tell()
if lastbyte != filesize:
return False, 0
else:
return process_data(f, filesize)
def validate_files(files):
for file in files:
filename, rootdir, lastmod, creation, filesize = fileinfo(file)
f = open(filename)
result, record_count = validate_and_process_data(f, filesize)
if result:
# -- Success! Write text file with count of records
# recognized in record_count
print('success with %s on %d records' % (filename, record_count,), file=sys.stderr)
pass
else:
# -- Failure; need additional logic here to determine
# the cause for failure.
print('failure with %s after %d records' % (filename, record_count,), file=sys.stderr)
pass
def fileinfo(file):
filename = os.path.basename(file)
rootdir = os.path.dirname(file)
lastmod = time.ctime(os.path.getmtime(file))
creation = time.ctime(os.path.getctime(file))
filesize = os.path.getsize(file)
return filename, rootdir, lastmod, creation, filesize
if __name__ == '__main__':
import sys
validate_files(sys.argv[1:])
# -- end of file
[0] http://en.wikipedia.org/wiki/Functional_programming
[1] https://docs.python.org/2/library/tempfile.html#module-tempfile
--
Martin A. Brown
http://linux-ip.net/
More information about the Tutor
mailing list