What is Python?
Donn Cave
donn at u.washington.edu
Wed Sep 20 15:54:29 EDT 2000
Quoth Andrew Kuchling <akuchlin at mems-exchange.org>:
[ ... re regular expressions considered harmful ]
| Because people too often apply them to inappropriate tasks; the last
| example I can recall was someone in c.l.p who was trying to use
| regexes to filter out files with 'SCCS' in the path and .java at the
| end. The regex to do this is not easy to write and not clear. Part
| of the problem is that, like Prolog, you really need to understand the
| underlying implementation to write regexes properly. Making regexes
| purely declarative might fix this, but even there .* behaves
| counterintuitively. An easy way of parsing text has not yet been
| found, I think.
Amen, sort of! I have said it before, the easiest text parsing I ever
saw was the "PARSE" statement in REXX. A language I don't know well,
but that's the point, you don't have to know anything to use this
"parse by example" system.
Now it would probably only make a REXX programmer cry, but Aaron Watters
wrote a tparse module that works kind of the same way and has some extra
features. Cf ftp://ftp.python.org/pub/www.python.org/ftp/python/contrib-09-Dec-1999/DataStructures/tparsing.py
For an example, I wrote the following up in a few minutes. It analyzes
a syslog log file, and reports successful and unsuccessful Kerberos
authentication attempts and reasons for failure. I probably made it
more complicated than illustrative by wrapping the parse templates in
my own class; tparse raises a ValueError on no match, which is the
right thing for it to do but made for an awkward loop full of try/except
blocks, and that's why the wrapper. The clearer way to call the PARSE
function is like ((x1, x2, value, x4), chars) = template.PARSE(data)
(if the template is like */*<*>* and you want the <*> part.)
Donn Cave, donn at u.washington.edu
----------------------------------------
import sys
from tparsing import Template
class T:
def __init__(self, template, av = None):
self.template = Template(template, '*')
self.av = av
def parse(self, data):
try:
result, chars = self.template.PARSE(data)
i = 0
if self.av:
for a in self.av:
if a is None:
pass
else:
setattr(self, a, result[i])
i = i + 1
return result
except ValueError:
return None
success = T('* authtime *, *@*', (None, None, 'user'))
fail = T('* PREAUTH_FAILED: *@*', (None, 'user'))
nopa = T('* NEEDED_PREAUTH: *@*', (None, 'user'))
paverify = T('* preauth (*) verify failure: *\n', (None, 'patype', 'error'))
palog = None
while 1:
s = sys.stdin.readline()
if not s:
break
if success.parse(s):
print repr(success.user), 'OK'
elif fail.parse(s):
if palog:
print repr(fail.user), palog
palog = None
else:
print repr(fail.user), '?'
elif nopa.parse(s):
print repr(nopa.user), '(needed preauth)'
elif paverify.parse(s):
palog = '%s (%s)' % (paverify.error, paverify.patype)
else:
print "here's an odd one:", s,
More information about the Python-list
mailing list