[Tutor] Regex
Matt Williams
matthew.williams at cancer.org.uk
Mon Aug 14 16:10:19 CEST 2006
Dear All,
I know this has come up loads of times before, but I'm stuck with what
should be a simple Regex problem. I'm trying to pull all the definitions
from a latex document. these are marked
\begin{defn}
<TEXT>
\end{defn}
so I thought I'd write something like this:
filename = '/home/acl_home/PhD/CurrentPhD/extensions1_14.8.6.tex'
infile = open(filename,'r')
def_start = "\\begin\{defn\}"
def_end = "\end{defn}"
def_start_reg = re.compile(def_start)
l = 0
while l < 500:
line = infile.readline()
#print l, line
res = re.search(def_start_reg,line)
print l, res
l = l+1
but it doesn't return any matches (BTW, I know there's a defn tag in
that section). I thought it was my regex matching, but I checked it with
an online checker, and also with a small bit of text:
def_start = "\\begin\{defn\}"
def_start_reg = re.compile(def_start)
text = """atom that is grounded. These formulae are useful not only for the
work on valuation but are also used in later chapters.
\begin{defn}
A Patient-ground formula is a formula which contains a grounding of
$Patient(x)$. The other atoms in the formula may be either ground
or non-ground.
\end{defn}
Having defined our patient ground formulae, we can now use formulae
of this form to define our patient values."""
res = re.search(def_start_reg, text)
print res
and this returns a MatchObject. I'm not sure why there should be any
difference between the two - but I'm sure it's very simple.
Thanks for any tips,
Matt
More information about the Tutor
mailing list