Working with fixed format text db's

Neil Cerutti horpner at yahoo.com
Fri Jun 8 14:51:16 EDT 2007


On 2007-06-08, Jeremy C B Nicoll <jeremy at omba.demon.co.uk> wrote:
> Neil Cerutti <horpner at yahoo.com> wrote:
>> Luckily, the output format has not changed yet, so issues with
>> maintaining the above haven't arisen.
>
> The problem surely is that when you want to change the format
> you have to do so in all files (and what about the backups
> then?) and all programs simultaneously.

I don't have control of the format, unfortunately. It's an import
file format for a commercial database application.

> Maintaining the code is the least of your the problems, I'd
> say.
>
> You could change the data layout so that eg each field was
> terminated by a marker character, then read/write delimited
> values.  But unless you also review all the other parts of your
> programs, you need to be sure that you don't have any other
> code anywhere that implicitly relies on a particular field
> being a known fixed length.
>
>> However, I'd like something better.
>
> What precisely do you want to achieve?

I was hoping for a module that provides a way for me to specify a
fixed file format, along with some sort of interface for writing
and reading files that are in said format.

It is not actually *hard* to do this with ad-hoc code, but then
the program is indecipherable without a hardcopy of the spec in
hand. And also, as you say, if the spec ever does change, the
hand-written batch of ljust, rjust and slice will be somewhat of
a pain to reconfigure.

But biggest weakness, to me, is that the specification is not in
the code, or read and used by the code, and I think it should be.

If nothing exists already I guess I'll roll my own. But I'd like
to be lazier, and virtually all published modules are better than
what I'll write for myself. ;)

The underlying problem, of course, is the archaic flat-file
format with fixed-width data fields. Even the Department of
Education has moved on to XML for most of it's data files, which
are much simpler for me to parse.

-- 
Neil Cerutti



More information about the Python-list mailing list