Improving my text processing script
pruebauno at latinmail.com
pruebauno at latinmail.com
Thu Sep 1 16:21:24 EDT 2005
Paul McGuire wrote:
> match...), this program has quite a few holes.
>
> What if the word "Identifier" is inside one of the quoted strings?
> What if the actual value is "tablename10"? This will match your
> "tablename1" string search, but it is certainly not what you want.
> Did you know there are trailing blanks on your table names, which could
> prevent any program name from matching?
Good point. I did not think about that. I got lucky because none of the
table names had trailing blanks (google groups seems to add those) the
word identifier is not used inside of quoted strings anywhere and I do
not have tablename10, but I do have "dba.tablename1" and that one has
to match with tablename1 (and magically did).
>
> So here is an alternative approach using, as many have probably
> predicted by now if they've spent any time on this list, the pyparsing
> module. You may ask, "isn't a parser overkill for this problem?" and
You had to plug pyparsing! :-). Thanks for the info I did not know
something like pyparsing existed. Thanks for the code too, because
looking at the module it was not totally obvious to me how to use it. I
tried run it though and it is not working for me. The following code
runs but prints nothing at all:
import pyparsing as prs
f=file('tlst'); tlst=[ln.strip() for ln in f if ln]; f.close()
f=file('plst'); plst=f.read() ; f.close()
prs.quotedString.setParseAction(prs.removeQuotes)
identLine=(prs.LineStart()
+ 'Identifier'
+ prs.quotedString
+ prs.LineEnd()
).setResultsName('prog')
tableLine=(prs.LineStart()
+ 'Value'
+ prs.quotedString
+ prs.LineEnd()
).setResultsName('table')
interestingLines=(identLine | tableLine)
for toks,start,end in interestingLines.scanString(plst):
print toks,start,end
More information about the Python-list
mailing list