next line (data parsing)

Paul McGuire ptmcg at austin.rr.com
Wed Jan 16 20:29:37 EST 2008


On Jan 16, 6:54 pm, robleac... at gmail.com wrote:
> Hi there,
> I'm struggling to find a sensible way to process a large chuck of
> data--line by line, but also having the ability to move to subsequent
> 'next' lines within a for loop. I was hoping someone would be willing
> to share some insights to help point me in the right direction. This
> is not a file, so any file modules or methods available for files
> parsing wouldn't apply.
>
> I run a command on a remote host by using the pexpect (pxssh) module.
> I get the result back which are pages and pages of pre-formatted text.
> This is a pared down example (some will notice it's tivoli schedule
> output).
>

Pyparsing will work on a string or a file, and will do the line-by-
line iteration for you.  You just have to define the expected format
of the data.  The sample code below parses the data that you posted.
From this example, you can refine the code by assigning names to the
different parsed fields, and use the field names to access the parsed
values.

More info about pyparsing at http://pyparsing.wikispaces.com.

-- Paul



from pyparsing import *

integer = Word(nums)
timestamp = Combine(Word(nums,exact=2)+":"+Word(nums,exact=2))
dateString = Combine(Word(nums,exact=2)+"/"+
                     Word(nums,exact=2)+"/"+
                     Word(nums,exact=2))

schedHeader = Literal("Schedule HOST") + Word("#",alphas+"_") + "(" +
")" + \
             timestamp + integer + timestamp+"("+dateString+")" + \
             Optional(~LineEnd() + empty + restOfLine)
schedLine = Group(Word("(",alphanums) + Word(alphanums+"_") +
timestamp +
             integer + Optional(~LineEnd() + empty + restOfLine)
             ) + LineEnd().suppress()
schedTotal = Literal("Total") + timestamp

sched = schedHeader + Group(OneOrMore(schedLine)) + schedTotal

from pprint import pprint
for s in sched.searchString(data):
    pprint( s.asList() )
    print


Prints:

['Schedule HOST',
 '#ALL_LETTERS',
 '(',
 ')',
 '00:01',
 '10',
 '22:00',
 '(',
 '01/16/08',
 ')',
 'LTR_CLEANUP ',
 [['(SITE1', 'LTR_DB_LETTER', '00:01', '10']],
 'Total',
 '00:01']

['Schedule HOST',
 '#DAILY',
 '(',
 ')',
 '00:44',
 '10',
 '18:00',
 '(',
 '01/16/08',
 ')',
 'DAILY_LTR ',
 [['(SITE3', 'RUN_LTR14_PROC', '00:20', '10'],
  ['(SITE1', 'LTR14A_WRAPPER', '00:06', '10', 'SITE3#RUN_LTR14_PROC
'],
  ['(SITE1', 'LTR14B_WRAPPER', '00:04', '10', 'SITE1#LTR14A_WRAPPER
'],
  ['(SITE1', 'LTR14C_WRAPPER', '00:03', '10', 'SITE1#LTR14B_WRAPPER
'],
  ['(SITE1', 'LTR14D_WRAPPER', '00:02', '10', 'SITE1#LTR14C_WRAPPER
'],
  ['(SITE1', 'LTR14E_WRAPPER', '00:01', '10', 'SITE1#LTR14D_WRAPPER
'],
  ['(SITE1', 'LTR14F_WRAPPER', '00:03', '10', 'SITE1#LTR14E_WRAPPER
'],
  ['(SITE1', 'LTR14G_WRAPPER', '00:03', '10', 'SITE1#LTR14F_WRAPPER
'],
  ['(SITE1', 'LTR14H_WRAPPER', '00:02', '10', 'SITE1#LTR14G_WRAPPER
']],
 'Total',
 '00:44']

['Schedule HOST',
 '#CARDS',
 '(',
 ')',
 '00:02',
 '10',
 '20:30',
 '(',
 '01/16/08',
 ')',
 'STR2_D ',
 [['(SITE7', 'DAILY_MEETING_FILE', '00:01', '10'],
  ['(SITE3', 'BEHAVE_HALT_FILE', '00:01', '10', 'SITE7#DAILY_HOME_FILE
']],
 'Total',
 '00:02']



More information about the Python-list mailing list