Browsing text ; Python the right tool?

John Machin sjmachin at lexicon.net
Wed Jan 26 03:42:04 CET 2005


Jeff Shannon wrote:
> Paul Kooistra wrote:
>
> > 1. Does anybody now of a generic tool (not necessarily Python
based)
> > that does the job I've outlined?
> > 2. If not, is there some framework or widget in Python I can adapt
to
> > do what I want?
>
> Not that I know of, but...
>
> > 3. If not, should I consider building all this just from scratch in
> > Python - which would probably mean not only learning Python, but
some
> > other GUI related modules?
>
> This should be pretty easy.  If each record is CRLF terminated, then
> you can get one record at a time simply by iterating over the file
> ("for line in open('myfile.dat'): ...").  You can have a dictionary
of
> classes or factory functions, one for each record type, keyed off of
> the 2-character identifier.  Each class/factory would know the layout

> of that record type,

This is plausible only under the condition that Santa Claus is paying
you $X per class/factory or per line of code, or you are so speed-crazy
that you are machine-generating C code for the factories.

I'd suggest "data driven" -- you grab the .doc or .pdf that describes
your layouts, ^A^C, fire up Excel, paste special, massage it, so you
get one row per field, with start & end posns, type, dec places,
optional/mandatory, field name, whatever else you need. Insert a column
with the record name. Save it as a CSV file.

Then you need a function to load this layout file into dictionaries,
and build cross-references field_name -> field_number (0,1,2,...) and
vice versa.

As your record name is not in a fixed position in the record, you will
also need to supply a function (file_type, record_string) ->
record_name.

Then you have *ONE* function that takes a file_type, a record_name, and
a record_string, and gives you a list of the values. That is all you
need for a generic browser application.

For working on a _specific_ known file_type, you can _then_ augment
that to give you record objects that you use like a0.zipcode or record
dictionaries that you use like a0['zipcode'].

You *don't* have to hand-craft a class for each record type. And you
wouldn't want to, if you were dealing with files whose spec keeps on
having fields added and fields obsoleted.

Notice: in none of the above do you ever have to type in a column
position, except if you manually add updates to your layout file.

Then contemplate how productive you will be when/if you need to
_create_ such files -- you will push everything through one function
which will format each field correctly in the correct column positions
(and chuck an exception if it won't fit). Slightly better than an
approach that uses
something like nbytes = sprintf(buffer, "%04d%-20s%-5s", a0_num,
a0_phone, a0_zip); 

HTH,
John




More information about the Python-list mailing list