[Tutor] Regex

Matt Williams matthew.williams at cancer.org.uk
Mon Aug 14 16:10:19 CEST 2006


Dear All,

I know this has come up loads of times before, but I'm stuck with what 
should be a simple Regex problem. I'm trying to pull all the definitions 
  from a latex document. these are marked

\begin{defn}
<TEXT>
\end{defn}

so I thought I'd write something like this:

filename = '/home/acl_home/PhD/CurrentPhD/extensions1_14.8.6.tex'

infile = open(filename,'r')

def_start = "\\begin\{defn\}"
def_end = "\end{defn}"

def_start_reg = re.compile(def_start)

l = 0
while l < 500:
     line = infile.readline()
     #print l, line
     res = re.search(def_start_reg,line)
     print l, res
     l = l+1

but it doesn't return any matches (BTW, I know there's a defn tag in 
that section). I thought it was my regex matching, but I checked it with 
an online checker, and also with a small bit of text:


def_start = "\\begin\{defn\}"

def_start_reg = re.compile(def_start)

text = """atom that is grounded. These formulae are useful not only for the
work on valuation but are also used in later chapters.

\begin{defn}
A Patient-ground formula is a formula which contains a grounding of
$Patient(x)$. The other atoms in the formula may be either ground
or non-ground.
\end{defn}
Having defined our patient ground formulae, we can now use formulae
of this form to define our patient values."""

res = re.search(def_start_reg, text)
print res



and this returns a MatchObject. I'm not sure why there should be any 
difference between the two - but I'm sure it's very simple.

Thanks for any tips,

Matt


More information about the Tutor mailing list