problem negative lookahead assertion

Geoff Gerrietts geoff at gerrietts.net
Mon Apr 15 16:19:44 EDT 2002


Quoting sjoerd siebinga (ssiebinga at fa.knaw.nl):
> 
> When this was applied to the text below all the \emph{} phrases are
> replaced instead of only those not followed by the \index{} commando.

Okay. I put together the script I've appended below my .signature and
ran it under python 2.2, python 1.5.2, and python 2.1. The script was
basically a lot of cut and paste from your posting.

When I run it, it brings up a prompt (which is what it's supposed to
do):

>>> data == mystr
1

In other words, no change at all has been made.

In all three interpreters, that's what I get. I assume that's a
problem, but a different problem than what you're actually complaining
about.

Others may be more familiar with your problem domain or more willing
to guess at it, but I'm gonna request that you provide, in addition to
your pattern, an example of what you're trying to transform, an
example of what you want to get, and an example of what you actually
get.

If I were to supply some general advice, based on what I've seen in
your patterns, it would be to avoid using .*  in your patterns if at
all possible -- prefer a [^ ] containing any "stop characters". Even
when the .* is nongreedy, it will swallow as much as it needs to, to
make a match. That could be half your document if that's what it takes
to get to a comma, whitespace character, or non-alphanumeric character
that isn't followed by \index.

Luck,
--G.

-- 
Geoff Gerrietts             <geoff at gerrietts dot net>
  I AM YOUR KING! BOW BEFORE ME, PEASANT!   -- Dogbert

#!/usr/bin/python -i

import re

emph = re.compile(r'\s.\\emph\{(.*?)\}([\s,\W])(?!\\index)')


mystr = r"""
\begin{germdata} ON \emph{va\th a} \index{on~va\th a}`wade, rush, walk
through', OE \emph{wadan}, \index{oe~wadan} OHG, MHG \emph{watan}
\index{mhg~watan}`wade, stride', MLG \emph{w\=aden},
\index{mlg~w\=aden} MDu. \emph{waden}, \index{mdu~waden} \emph{waeyen}
`wade, go' \end{germdata}
"""


data = emph.sub('\emph{\\1}\\2 \index{unl~\\1}', mystr)






More information about the Python-list mailing list