Line Text Parsing
Larry Bates
lbates at swamisoft.com
Thu Feb 5 10:36:27 EST 2004
I think one of the easiest ways to do this is to
write a class that knows how to parse each of the
unique lines. As you are reading through the file/table
and encounter a line like the first, create a new
class instance and pass it the line's contents. The
__init__ method of the class can parse the line and
place each of the field values in an attribute of the
class.
Something like (this is pseudocode):
class linetype01:
#
# Define a list that contains information about how to
# parse a single linetype. The info is fieldname,
# beginning column, ending column, fieldlength
#
_parsinginfo=[('recnum',0,8),
('linetype',8,3),
('dataitem2',11,3),
...)
def __init__(self, linetext):
self.linetext=linetext
for fieldname, begincol, fieldlength in _parsinginfo:
self.__dict__[fieldname]=linetext[begincol,
begincol+fieldlength+1]
return
you would define a class like this for each unique linetype
in main program
import sys
#
# Insert code to open file/table here
#
for line in table:
#
# See which linetype it is
#
linetype=line[8:10]
if linetype == "01":
pline=linetype01(line)
#
# Now you can extract the values by accessing attributes of
# the class.
#
recordnum=pline.recnum
tlinetype=pline.linetype
#
# Do something with the values
#
elif linetype == "55":
pline=linetype55(line)
elif linetype == "20":
pline=linetype20(line)
else:
print "ERROR-Illegal linetype encountered")
sys.exit(2)
Just one of many ways to solve this problem.
-Larry
"allanc" <kawNOSPAMenks at nospamyahoo.ca> wrote in message
news:Xns948575A2C930Aacuencacanadacom at 198.161.157.145...
> I'm new with python so bear with me.
>
> I'm looking for a way to elegantly parse fixed-width text data (as opposed
> to CSV) and saving the parsed data unto a database. The text data comes
> from an old ISAM-format table and each line may be a different record
> structure depending on key fields in the line.
>
> RegExp with match and split are of interest but it's been too long since
> I've dabbled with RE to be able to judge whether its use will make the
> problem more complex.
>
> Here's a sample of the records I need to parse:
>
> 01508390019002 11284361000002SUGARPLUM
> 015083915549 SHORT ON LAST ORDER
> 0150839220692 000002EA BMC 15 KG 001400
>
> 1st Line is a (portion of) header record.
> 2nd Line is an text instruction record.
> 3rd Line is a Transaction Line Item record.
>
> Each type of record has a different structure. But these set of lines
> appear in the one table.
>
>
> Any ideas would be greatly appreciated.
>
> Allan
More information about the Python-list
mailing list