Newbie code review of parsing program Please

len lsumnler at gmail.com
Sun Nov 16 18:34:06 CET 2008


I have created the following program to read a text file which happens
to be a cobol filed definition.  The program then outputs to a file
what is essentially a file which is a list definition which I can
later
copy and past into a python program.  I will eventually expand the
program
to also output an SQL script to create a SQL file in MySQL

The program still need a little work, it does not handle the following
items
yet;

1.  It does not handle OCCURS yet.
2.  It does not handle REDEFINE yet.
3.  GROUP structures will need work.
4.  Does not create SQL script yet.

It is my anticipation that any files created out of this program may
need
manual tweeking but I have a large number of cobol file definitions
which
I may need to work with and this seemed like a better solution than
hand
typing each list definition and SQL create file script by hand.

What I would like is if some kind soul could review my code and give
me
some suggestions on how I might improve it.  I think the use of
regular
expression might cut the code down or at least simplify the parsing
but
I'm just starting to read those chapters in the book;)

*** SAMPLE INPUT FILE ***

000100 FD  SALESMEN-FILE
000200     LABEL RECORDS ARE STANDARD
000300     VALUE OF FILENAME IS "SALESMEN".
000400
000500 01  SALESMEN-RECORD.
000600     05  SALESMEN-NO                PIC 9(3).
000700     05  SALESMEN-NAME              PIC X(30).
000800     05  SALESMEN-TERRITORY         PIC X(30).
000900     05  SALESMEN-QUOTA             PIC S9(7) COMP.
001000     05  SALESMEN-1ST-BONUS         PIC S9(5)V99 COMP.
001100     05  SALESMEN-2ND-BONUS         PIC S9(5)V99 COMP.
001200     05  SALESMEN-3RD-BONUS         PIC S9(5)V99 COMP.
001300     05  SALESMEN-4TH-BONUS         PIC S9(5)V99 COMP.

*** PROGRAM CODE ***

#!/usr/bin/python

import sys

f_path = '/home/lenyel/Bruske/MCBA/Internet/'
f_name = sys.argv[1]

fd = open(f_path + f_name, 'r')

def fmtline(fieldline):
    size = ''
    type = ''
    dec = ''
    codeline = []
    if fieldline.count('COMP.') > 0:
        left = fieldline[3].find('(') + 1
        right = fieldline[3].find(')')
        num = fieldline[3][left:right].lstrip()
        if fieldline[3].count('V'):
            left = fieldline[3].find('V') + 1
            dec = int(len(fieldline[3][left:]))
            size = ((int(num) + int(dec)) / 2) + 1
        else:
            size = (int(num) / 2) + 1
            dec = 0
        type = 'Pdec'
    elif fieldline[3][0] in ('X', '9'):
        dec = 0
        left = fieldline[3].find('(') + 1
        right = fieldline[3].find(')')
        size = int(fieldline[3][left:right].lstrip('0'))
        if fieldline[3][0] == 'X':
            type = 'Xstr'
        else:
            type = 'Xint'
    else:
        dec = 0
        left = fieldline[3].find('(') + 1
        right = fieldline[3].find(')')
        size = int(fieldline[3][left:right].lstrip('0'))
        if fieldline[3][0] == 'X':
            type = 'Xint'
    codeline.append(fieldline[1].replace('-', '_').replace('.',
'').lower())
    codeline.append(size)
    codeline.append(type)
    codeline.append(dec)
    return codeline

wrkfd = []
rec_len = 0

for line in fd:
    if line[6] == '*':      # drop comment lines
        continue
    newline = line.split()
    if len(newline) == 1:   # drop blank line
        continue
    newline = newline[1:]
    if 'FILENAME' in newline:
        filename = newline[-1].replace('"','').lower()
        filename = filename.replace('.','')
        output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
+'.fd', 'w')
        code = filename + ' = [\n'
        output.write(code)
    elif newline[0].isdigit() and 'PIC' in newline:
        wrkfd.append(fmtline(newline))
        rec_len += wrkfd[-1][1]

fd.close()

fmtfd = []

for wrkline in wrkfd[:-1]:
    fmtline = str(tuple(wrkline)) + ',\n'
    output.write(fmtline)

fmtline = tuple(wrkfd[-1])
fmtline = str(fmtline) + '\n'
output.write(fmtline)

lastline = ']\n'
output.write(lastline)

lenrec = filename + '_len = ' + str(rec_len)
output.write(lenrec)

output.close()

*** RESULTING OUTPUT ***

salesmen = [
('salesmen_no', 3, 'Xint', 0),
('salesmen_name', 30, 'Xstr', 0),
('salesmen_territory', 30, 'Xstr', 0),
('salesmen_quota', 4, 'Pdec', 0),
('salesmen_1st_bonus', 4, 'Pdec', 2),
('salesmen_2nd_bonus', 4, 'Pdec', 2),
('salesmen_3rd_bonus', 4, 'Pdec', 2),
('salesmen_4th_bonus', 4, 'Pdec', 2)
]
salesmen_len = 83

If you find this code useful please feel free to use any or all of it
at your own risk.

Thanks
Len S



More information about the Python-list mailing list