OO Approach to file parsing with Python?
Frank V. Castellucci
frankc at colconsulting.com
Tue Jun 27 06:56:43 EDT 2000
I saw somewhere, and excuse me for not remembering, someone wrote a
Python package that allows you to express the file layout using a
modified EBNF grammar, it took care of the rest except for of course
walking the file via the Python objects. I think it's name was Parsley.
Jon McLin wrote:
>
> Our company has decided to try an initial deployment of Python in a file-parsing
> role. We need to translate some customer-provided structured ASCII files into
> our standard schema for use by legacy applications. A procedural
> implementation is obvious, but completely non-reusable, and I'd like to be able
> to leverage this in the future. (different inputs, the same outputs.) Can
> someone point me to examples of OO approaches to this type of problem,
> preferrably implemented in Python?
>
> Some data, and thoughts:
>
> The basic file structure is
>
> Multi-Line Header
> One or More DataSets
> where each Data Set consists of
> a single line header, and
> one or more lines of tab-delimited data (fixed structure across all sets).
>
> The data sets are identified by the structure of their header lines.
>
> A nested loop processor is the obvious procedural approach, assigning each input
> field to a corresponding field in an output object.
>
> One seemingly-OO approach is to create classes for each construct on the input
> side. The constructors would actually parse the file, and call subordinate
> constructors (member classes) as appropriate. (This seems awkward).
> The output classes would match our schema, and of course these would be
> reusable.
>
> Since there is not a one-for-one correspondence between input and output in
> either classes or records, the next step would be to derive the set of outputs
> from the inputs (quantity of output instances). These would be constructed,
> and the input structure would be walked, telling each instance to populate it's
> output analogue. Finally, the output hierarchy would be walked, telling each
> instance to "Publish" itself.
>
> With this basic model, the overall structure and the output classes would be
> reusable, while the input classes and the translators (if not part of the input
> classes) would be application specific.
>
> Conceptual Code Snippet:
> # Parse and Create an object hierarchy
> # MyInput becomes an object containing member objects, dictionaries, etc..
> # SourceData is a class in an ApplicationSpecific
> MyInputObject=SourceData("MyFile")
>
> # Translate and create output hierarchy.
> # The custom classes will need to import the standard OutputObject module,
> which contains constructors
> # for all output classes.
> MyOutputObject=MyInputObject.Translate()
>
> # Create the desired (standard) output
> MyOutputObject.Publish()
>
> But as I noted, parsing with this model seems awkward.
>
> Any pointers/constructive criticisms/suggestions are much appreciated.
> Non-constructive criticism will be deflected without ego damage. After all, I
> know that I know far less than I know not, and that furthermore far more is
> unknown than is known to be unknown. Sigh. ;>
>
> Thanks,
> Jon
--
Frank V. Castellucci
http://corelinux.sourceforge.net
OOA/OOD/C++ Standards and Guidelines for Linux
http://PythPat.sourceforge.net
Pythons Pattern Package
More information about the Python-list
mailing list