OO Approach to file parsing with Python?

Frank V. Castellucci frankc at colconsulting.com
Tue Jun 27 06:56:43 EDT 2000


I saw somewhere, and excuse me for not remembering, someone wrote a
Python package that allows you to express the file layout using a
modified EBNF grammar, it took care of the rest except for of course
walking the file via the Python objects. I think it's name was Parsley.

Jon McLin wrote:
> 
> Our company has decided to try an initial deployment of Python in a file-parsing
> role.  We need to translate some customer-provided structured ASCII files into
> our standard schema for use by legacy applications.    A procedural
> implementation is obvious, but completely non-reusable, and I'd like to be able
> to leverage this in the future.  (different inputs, the same outputs.)  Can
> someone point me to examples of OO approaches to this type of problem,
> preferrably implemented in Python?
> 
> Some data, and thoughts:
> 
> The basic file structure is
> 
> Multi-Line Header
> One or More DataSets
>     where each Data Set consists of
>     a single line header, and
>     one or more lines of tab-delimited data (fixed structure across all sets).
> 
> The data sets are identified by the structure of their header lines.
> 
> A nested loop processor is the obvious procedural approach, assigning each input
> field to a corresponding field in an output object.
> 
> One seemingly-OO approach is to create classes for each construct on the input
> side.  The constructors would actually parse the file, and call subordinate
> constructors (member classes) as appropriate.   (This seems awkward).
> The output classes would match our schema, and of course these would be
> reusable.
> 
> Since there is not a one-for-one correspondence between input and output in
> either classes or records, the next step would be to derive the set of  outputs
> from the inputs (quantity of output instances).    These would be constructed,
> and the input structure would be walked, telling each instance to populate it's
> output analogue.  Finally, the output hierarchy would be walked, telling each
> instance to "Publish" itself.
> 
> With this basic model, the overall structure and the output classes would be
> reusable, while the input classes and the translators (if not part of the input
> classes) would be application specific.
> 
> Conceptual Code Snippet:
> #  Parse and Create an object hierarchy
> #    MyInput becomes an object containing member objects, dictionaries, etc..
> #    SourceData is a class in an ApplicationSpecific
> MyInputObject=SourceData("MyFile")
> 
> # Translate and create output hierarchy.
> #  The custom classes will need to import the standard OutputObject module,
> which contains constructors
> #    for all output classes.
> MyOutputObject=MyInputObject.Translate()
> 
> #  Create the desired (standard) output
> MyOutputObject.Publish()
> 
> But as I noted, parsing with this model seems awkward.
> 
> Any pointers/constructive criticisms/suggestions are much appreciated.
> Non-constructive criticism will be deflected without ego damage.   After all, I
> know that I know far less than I know not, and that furthermore far more is
> unknown than is known to be unknown.   Sigh.   ;>
> 
> Thanks,
> Jon

-- 
Frank V. Castellucci
http://corelinux.sourceforge.net
OOA/OOD/C++ Standards and Guidelines for Linux
http://PythPat.sourceforge.net
Pythons Pattern Package



More information about the Python-list mailing list