How to read files written with COBOL

Thu May 13 23:16:48 EDT 2004

Steve Williams <stevewilliams at wwc.com> wrote in message news:<nJhoc.186646$L31.171932 at nwrddc01.gnilink.net>...
> asdf sdf wrote:
> > Is it feasible to go to directly to MVS/DB2/Adabas from Python on Unix 
> > or Win?

At least for DB2 this shouldn't be a problem - but would typically
involve a separate product - called "DB2 Connect".  Shouldn't be cheap
or require any MVS components:
http://www-306.ibm.com/software/data/db2/db2connect/

> > Is it more realistic to hit DB2 on AIX or Linux and use some kind of DB2 
> >  linking or replication to reach DB2/MVS?

No, DB2 Connect should give you odbc, jdbc, cli, etc protocols
directly to mvs.  You can go through another db2 database, but that's
probably extra work & complexity.

> Other than the overall success of the project (I've been told successful
> data warehouse projects are rare) the major benefit of using Python was
> the ability to try new concepts quickly.  With python you have
> enormous flexibility, as opposed to compiled languages (COBOL, C, etc)
> or third party ETL utilities.

Nice case study.  I've been building ETL systems for twelve years and
am on my second python etl project right now.  Python has proved
itself the best option - there's nothing like adaptability when you've
got a dozen system interfaces to maintain! And its quick learning
curve has meant that bringing others up to speed has been a snap.

Most of my communication with db2 is just over the command line (via
popen2.Popen3) which is the only way to issue commands such as load,
export, force application, list application, etc.  However, quite a
few of my summaries are run this way as well (typically mass inserts)
and aside from the primitive error codes, it works fine.  There's also
at least one db2 python package (PyDB2).  Here's a link to the
package:
http://sourceforge.net/projects/pydb2/
and here's a link to a tutorial for it:
https://www6.software.ibm.com/reg/devworks/dw-db2pylnx-i?S_TACT=102B7W91&S_CMP=DB2DD
I'm not using it yet, though a coworker just installed and started
using a python db2 module - I assume that it is this one.

And as far as reading files written in COBOL, here's a few thoughts:
1.  don't make python read all the COBOL data types, instead make the
COBOL program write out a plain ascii record.  Writing to a
fixed-length ascii record is very simple (if a little tedious to parse
on the other side).
2.  if you can't modify the COBOL output...then you could consider a
commercial (perhaps with a free trial license) product that already
provides COBOL 'copybook' interpretation.  There are quite a few of
these, though the least expensive ones I'm aware of are SyncSort, Data
Junction, and perhaps Compuware's FileAid.  Don't think any have a
regular license for less than $1500.
3.  if you have to read non-character cobol files, then I'd try to
just keep the number of options down to a reasonable number:  you may
only need to support a few formats - such as zoned & packed decimal
(comp-3) for instance.  Variable length files, float, comp-4, isam,
etc aren't that common.  Redefines are often used in conjuction with
record types, and this can be sometimes simplified by just splitting
the file into multiple separate files by record type.  And all the
formatting in the picture clause can be easily handled in the program
that reads the files (implied decimal places, signs, etc are all very
simple).

buck