What is Python?

Donn Cave donn at u.washington.edu
Wed Sep 20 15:54:29 EDT 2000


Quoth Andrew Kuchling <akuchlin at mems-exchange.org>:
[ ... re regular expressions considered harmful ]

| Because people too often apply them to inappropriate tasks; the last
| example I can recall was someone in c.l.p who was trying to use
| regexes to filter out files with 'SCCS' in the path and .java at the
| end.  The regex to do this is not easy to write and not clear.  Part
| of the problem is that, like Prolog, you really need to understand the
| underlying implementation to write regexes properly.  Making regexes
| purely declarative might fix this, but even there .* behaves
| counterintuitively.  An easy way of parsing text has not yet been
| found, I think.

Amen, sort of!  I have said it before, the easiest text parsing I ever
saw was the "PARSE" statement in REXX.  A language I don't know well,
but that's the point, you don't have to know anything to use this
"parse by example" system.

Now it would probably only make a REXX programmer cry, but Aaron Watters
wrote a tparse module that works kind of the same way and has some extra
features.  Cf ftp://ftp.python.org/pub/www.python.org/ftp/python/contrib-09-Dec-1999/DataStructures/tparsing.py

For an example, I wrote the following up in a few minutes.  It analyzes
a syslog log file, and reports successful and unsuccessful Kerberos
authentication attempts and reasons for failure.  I probably made it
more complicated than illustrative by wrapping the parse templates in
my own class;  tparse raises a ValueError on no match, which is the
right thing for it to do but made for an awkward loop full of try/except
blocks, and that's why the wrapper.  The clearer way to call the PARSE
function is like   ((x1, x2, value, x4), chars) = template.PARSE(data)
(if the template is like */*<*>* and you want the <*> part.)

	Donn Cave, donn at u.washington.edu
----------------------------------------
import sys
from tparsing import Template

class T:
	def __init__(self, template, av = None):
		self.template = Template(template, '*')
		self.av = av
	def parse(self, data):
		try:
			result, chars = self.template.PARSE(data)
			i = 0
			if self.av:
				for a in self.av:
					if a is None:
						pass
					else:
						setattr(self, a, result[i])
					i = i + 1
			return result
		except ValueError:
			return None

success = T('* authtime *, *@*', (None, None, 'user'))
fail = T('* PREAUTH_FAILED: *@*', (None, 'user'))
nopa = T('* NEEDED_PREAUTH: *@*', (None, 'user'))
paverify = T('* preauth (*) verify failure: *\n', (None, 'patype', 'error'))

palog = None

while 1:
	s = sys.stdin.readline()
	if not s:
		break
	if success.parse(s):
		print repr(success.user), 'OK'
	elif fail.parse(s):
		if palog:
			print repr(fail.user), palog
			palog = None
		else:
			print repr(fail.user), '?'
	elif nopa.parse(s):
		print repr(nopa.user), '(needed preauth)'
	elif paverify.parse(s):
		palog = '%s (%s)' % (paverify.error, paverify.patype)
	else:
		print "here's an odd one:", s,



More information about the Python-list mailing list