more than 100 capturing groups in a regex

Joerg Schuster joerg.schuster at gmail.com
Tue Oct 25 08:50:49 EDT 2005


You did not quite understand me. I will give you some details:

My program is a compiler for a certain type of linguistic grammars.
I.e. the user gives *grammar files* to my program. When the grammar
files have been compiled, they can be applied to strings (of a certain
language, e.g. English).

In the grammar files, the user does not have to deal with "capturing
groups". He even does not have to deal with regular expressions. He
just writes a grammar of a certain type. My program then compiles the
grammar into a cascade of transducers. Each of the transducers is
internally represented as a pair (REGEX, ACTION), where REGEX is a
Python regular expression and ACTION a Python function. I.e.: The
meaning of the grammar is: For each line of the input string: if REGEX
matches the line, then apply ACTION to it.

On various levels, the user may produce *disjunctions*. At the time
being, these disjunctions are internally represented by something like:

if regex1: action1()
elif regex2: action2()
elif ...
eliif regexn: actionn()

It would be nicer (and faster) to have just one regex and run that
action A such that the *capturing group* with name A ("?P<A>...")
matched.

Now, I could of course internally create my very own transducers. But
the re module is a module that generates fsa and fsa do part of the
work that a transducer does. So why reinvent the wheel? 


Jörg




More information about the Python-list mailing list