Proposed new module for the Python library

There's a kind of boilerplate code I've written way too often. The most recent time, I got fed up and wrote a candidate library module so I'd never have to do it again. Here's the header comment: --------------------------------------------------------------------------- ccframe -- framework code for building compiler-like programs. There is a common `compiler-like' pattern in Unix scripts which is useful for translation utilities of all sorts. A program following this pattern behaves as a filter when no argument files are specified on the command line, but otherwise transforms each file individually into a corresponding output file. This module provides framework and glue code to make such programs easy to write. You supply a function to massage the file data; depending on which entry point you use, it can take input and output file pointers, or it can take a string consisting of the entire file's data and return a replacement, or it can take in succession strings consisting of each of the file's lines and return a translated line for each. Argument files are transformed in left to right order on the command line. A filename consisting of a dash is interpreted as a directive to read from standard input (this can be useful in pipelines). Replacement of each file is atomic and doesn't occur until the translation of that file has completed. Any tempfiles are removed automatically on any exception thrown by the translation function, and the exception is then passed upwards. The entry points return 0 on success, 1 to signal a failed file open, and 2 to signal a failed tempfile open or rename. Error messages are emitted to stderr. --------------------------------------------------------------------------- Design comments? Critiques? Code on request. I'm already considering throwing exceptions on open and rename errors instead of complaining to stderr and returning an error status. That would be more Pythonic, though slightly less convenient in the commonest cases. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> The two pillars of `political correctness' are, a) willful ignorance, and b) a steadfast refusal to face the truth -- George MacDonald Fraser

Sounds "import fileinput-ish" to me. Are there fixes to fileinput that will make it applicable to your problems? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

Paul Prescod <paulp@ActiveState.com>:
Sounds "import fileinput-ish" to me. Are there fixes to fileinput that will make it applicable to your problems?
No. The existing code in the fileinput module is intended to treat a sequence of files as one continuous input source. My module is designed to iterate over files, transforming each one to a corresponsing named output file. They're both input frameworks, but otherwise quite different. The fileinput model is appropriate for filters like cat(1) and tr(1). The ccframe model, on the other hand, is appropriate for compilers and file-format converters. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Our society won't be truly free until "None of the Above" is always an option.

"Eric S. Raymond" <esr@thyrsus.com>
Still, they sound similar enough that the functionality should perhaps be merged into one module, with options for getting "concatenated" or "separate" input files. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing <greg@cosc.canterbury.ac.nz>:
I'd be open to that -- it's why I said "existing" code :-). It doesn't matter to me whether my functions live in the fileinput module or in a sepate ccframe module. I wrote the documentation page this morning. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Faith may be defined briefly as an illogical belief in the occurrence of the improbable...A man full of faith is simply one who has lost (or never had) the capacity for clear and realistic thought. He is not a mere ass: he is actually ill. -- H. L. Mencken

On Thu, 2 Aug 2001, Eric S. Raymond wrote:
I did a FileUtil module for MacPython -- perhaps it was more complicated because it did all of the GUI also -- progress dialog with info lines and a cancel button, message boxes on errors, etc. You could drop a folder full of files on it and it would iterate thru them. But I found it difficult to generalize sufficiently -- there was not only different file processing methods to be defined, there were different input/output filename transformation rules, different message templates, different file selection rules, etc. Still -- it was useful, but I think I kept fiddling with it almost every time I used it. I think I started with a function that was passed a process callback function. It ended up a class that could be extended for each particular file processing job. But I think if I did it again now, I'ld use some sort of generator. There was some discussion of this: that the os.path.walk way of iterating with callbacks is really inside-out. Nested generators might be a handier building block for this sort of job -- kind of the python equivalent of a unix pipeline command. -- Steve

On 02 August 2001, Eric S. Raymond said:
Sounds like a useful tool to have around. I'm not sure the name 'ccframe' conjures up images of the above paragraph, but I'm not sure what name *does*. This is the hardest sort of software to name!
How to handle failure to open an input file depends on circumstances: * sometimes, it is utter calamity, and raising an exception is the right thing to do * sometimes, it's a user-level error, and try: file = open(filename) except IOError, err: sys.exit("%s: %s" % (filename, err.strerror)) is the right thing to do * sometimes, it's just something the user ought to know about, and the "except" clause above can be turned into sys.stderr.write("warning: %s: %s" % (filename, err.strerror)) continue # next file, please * conceivably, it's something you can just ignore IOW, I think this should be an option for users of the framework. The default should be to throw an exception -- kaBOOM! For me, the middle two options would be most commonly used, but modules should not sys.exit() or sys.stderr.write() unless they are explicitly told to do so. Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ My opinions may have changed, but not the fact that I am right.

On 02 August 2001, Eric S. Raymond said:
There is a common `compiler-like' pattern in Unix scripts which is useful for translation utilities of all sorts.
On Sat, 4 Aug 2001, Greg Ward wrote:
I too find the name fairly cryptic. "fileargs" perhaps? But it would be better to think about merging this and fileinput. -- ?!ng

Sounds "import fileinput-ish" to me. Are there fixes to fileinput that will make it applicable to your problems? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

Paul Prescod <paulp@ActiveState.com>:
Sounds "import fileinput-ish" to me. Are there fixes to fileinput that will make it applicable to your problems?
No. The existing code in the fileinput module is intended to treat a sequence of files as one continuous input source. My module is designed to iterate over files, transforming each one to a corresponsing named output file. They're both input frameworks, but otherwise quite different. The fileinput model is appropriate for filters like cat(1) and tr(1). The ccframe model, on the other hand, is appropriate for compilers and file-format converters. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Our society won't be truly free until "None of the Above" is always an option.

"Eric S. Raymond" <esr@thyrsus.com>
Still, they sound similar enough that the functionality should perhaps be merged into one module, with options for getting "concatenated" or "separate" input files. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Greg Ewing <greg@cosc.canterbury.ac.nz>:
I'd be open to that -- it's why I said "existing" code :-). It doesn't matter to me whether my functions live in the fileinput module or in a sepate ccframe module. I wrote the documentation page this morning. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Faith may be defined briefly as an illogical belief in the occurrence of the improbable...A man full of faith is simply one who has lost (or never had) the capacity for clear and realistic thought. He is not a mere ass: he is actually ill. -- H. L. Mencken

On Thu, 2 Aug 2001, Eric S. Raymond wrote:
I did a FileUtil module for MacPython -- perhaps it was more complicated because it did all of the GUI also -- progress dialog with info lines and a cancel button, message boxes on errors, etc. You could drop a folder full of files on it and it would iterate thru them. But I found it difficult to generalize sufficiently -- there was not only different file processing methods to be defined, there were different input/output filename transformation rules, different message templates, different file selection rules, etc. Still -- it was useful, but I think I kept fiddling with it almost every time I used it. I think I started with a function that was passed a process callback function. It ended up a class that could be extended for each particular file processing job. But I think if I did it again now, I'ld use some sort of generator. There was some discussion of this: that the os.path.walk way of iterating with callbacks is really inside-out. Nested generators might be a handier building block for this sort of job -- kind of the python equivalent of a unix pipeline command. -- Steve

On 02 August 2001, Eric S. Raymond said:
Sounds like a useful tool to have around. I'm not sure the name 'ccframe' conjures up images of the above paragraph, but I'm not sure what name *does*. This is the hardest sort of software to name!
How to handle failure to open an input file depends on circumstances: * sometimes, it is utter calamity, and raising an exception is the right thing to do * sometimes, it's a user-level error, and try: file = open(filename) except IOError, err: sys.exit("%s: %s" % (filename, err.strerror)) is the right thing to do * sometimes, it's just something the user ought to know about, and the "except" clause above can be turned into sys.stderr.write("warning: %s: %s" % (filename, err.strerror)) continue # next file, please * conceivably, it's something you can just ignore IOW, I think this should be an option for users of the framework. The default should be to throw an exception -- kaBOOM! For me, the middle two options would be most commonly used, but modules should not sys.exit() or sys.stderr.write() unless they are explicitly told to do so. Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ My opinions may have changed, but not the fact that I am right.

On 02 August 2001, Eric S. Raymond said:
There is a common `compiler-like' pattern in Unix scripts which is useful for translation utilities of all sorts.
On Sat, 4 Aug 2001, Greg Ward wrote:
I too find the name fairly cryptic. "fileargs" perhaps? But it would be better to think about merging this and fileinput. -- ?!ng
participants (6)
-
Eric S. Raymond
-
Greg Ewing
-
Greg Ward
-
Ka-Ping Yee
-
Paul Prescod
-
Steven D. Majewski