Browsing text ; Python the right tool?

Wed Jan 26 19:59:01 EST 2005

John Machin wrote:

> Jeff Shannon wrote:
> 
>>[...]  For ~10 or fewer types whose spec
>>doesn't change, hand-coding the conversion would probably be quicker
>>and/or more straightforward than writing a spec-parser as you
>>suggest.
> 
> I didn't suggest writing a "spec-parser". No (mechanical) parsing is
> involved. The specs that I'm used to dealing with set out the record
> layouts in a tabular fashion. The only hassle is extracting that from a
> MSWord document or a PDF.

The "specs" I'm used to dealing with are inconsistent enough that it's 
more work to "massage" them into strict tabular format than it is to 
retype and verify them.  Typically it's one or two file types, with 
one or two record types each, from each vendor -- and of course no 
vendor uses anything similar to any other, nor is there a standardized 
way for them to specify what they *do* use.  Everything is almost 
completely ad-hoc.

>>If, on the other hand, there are many record types, and/or those
>>record types are subject to changes in specification, then yes, it'd
>>be better to parse the specs from some sort of data file.
> 
> "Parse"? No parsing, and not much code at all: The routine to "load"
> (not "parse") the layout from the layout.csv file into dicts of dicts
> is only 35 lines of Python code. The routine to take an input line and
> serve up an object instance is about the same. It does more than the
> OP's browsing requirement already. The routine to take an object and
> serve up a correctly formatted output line is only 50 lines of which
> 1/4 is comment or blank.

There's a tradeoff between the effort involved in writing multiple 
custom record-type classes, and the effort necessary to write the 
generic loading routines plus the effort to massage coerce the 
specifications into a regular, machine-readable format.  I suppose 
that "parsing" may not precisely be the correct term here, but I was 
using it in parallel to, say, ConfigParser and Optparse.  Either 
you're writing code to translate some sort of received specification 
into a usable format, or you're manually pushing bytes around to get 
them into a format that your code *can* translate.  I'd say that my 
creation of custom classes is just a bit further along a continuum 
than your massaging of specification data -- I'm just massaging it 
into Python code instead of CSV tables.

>>I suspect
>>that we're both assuming a case similar to our own personal
>>experiences, which are different enough to lead to different
>>preferred solutions. ;)
> 
> Indeed. You seem to have lead a charmed life; may the wizards and the
> rangers ever continue to protect you from the dark riders! :-)

Hardly charmed -- more that there's so little regularity in what I'm 
given that massaging it to a standard format is almost as much work as 
just buckling down and retyping it.  My one saving grace is that I'm 
usually able to work with delimited files, rather than 
column-width-specified files.  I'll spare you the rant about my many 
job-related frustrations, but trust me, there ain't no picnics here!

Jeff Shannon
Technician/Programmer
Credit International