Is there a re guru out there?

Daniel Yoo dyoo at hkn.eecs.berkeley.edu
Mon Jul 8 04:47:34 EDT 2002


Krzysiek Czarnowski <krzysiek at dgt-lab.com.pl> wrote:
: I try to convert some LaTeX constructs to "plain" equivalents like \(,\) -->
: $, \[,\] --> $$
: and \frac{...}{...} --> {...\over ...}. The \frac bit appeared not to be
: trivial since ... should be balanced with respect to { }.

Hello!

Ah, a LaTeX to TeX converter.  Cool!


If you can guarantee that there isn't any nested braces within the
"frac" groups, then you can probably do this by making your regular
expression "non-greedy".

For example, let's say that we'd like to transform:

    {a}{b} --> (a b)


Here's an initial shot at it:

###
>>> re.sub(r'\{(.*)\}\{(.*)\}', r'(\1 \2)', '{foo}{bar} {baz}{boo}')
'(foo}{bar} {baz boo)'
###

The problem here is that the regular expression '.*' is too greedy: it
tries to eat as much as it can, and doesn't stop at the first closing brace.


To fix this, we can tell the regular expression to not be so greedy
about things, making it a little more indecisive by using the '?'
modifier:


##
>>> re.sub(r'\{(.*?)\}\{(.*?)\}', r'(\1 \2)', '{foo}{bar} {baz}{boo}')
'(foo bar) (baz boo)'
###

You can probably do a similar regular expression, as long as there
aren't groups nested within groups.



However, if there are groups within the frac groups, like:

    \frac{e^{42}}{\pi}

then that will cause problems.  This kind of pattern is something that
regular expressions alone won't be able to cope with, because the
grammar is recursively defined.  You'll need something more powerful
than a regular expression --- you'll need a parser.



Parsing is a large topic, so someone more competent than me should
handle it.  *grin*.  But here's an article that talks about parsing
with Python:

    http://www-106.ibm.com/developerworks/linux/library/l-simple.html?dwzone=linux



Best of wishes!



More information about the Python-list mailing list