SimpleParse and LookAhead
Mike C. Fletcher
mcfletch at rogers.com
Tue Apr 1 00:49:46 EST 2003
Hi Yvan,
Here's the code I assume you intended to send as the problem-description:
8<______________
declaration = r'''line := ?([a-z], root)
root := char*
<char> := [a-zA-Z0-9-]
'''
testData = '''root - Dir
'''
from simpleparse import generator
from mx.TextTools import TextTools
parser = generator.buildParser(declaration).parserbyname('line')
taglist = TextTools.tag(testData, parser)
print taglist
for tag, beg, end, parts in taglist[1]:
print testData[beg:end]
8<______________
Now, the problem that you are running into is that you are declaring
that the entire group "([a-z], root)" is a look ahead production. Look
ahead productions return results exactly equal to the result that would
be returned by the regular group. Since the [a-z] is consuming the
first character, the production "root" returns values from after that
first character to the end of the match. There should be no consumption
of characters (i.e. the next position is position 0).
Here are the alternatives, (you haven't really explained what you're
trying to do with the look ahead, so can't come up with anything more
helpful):
declaration = r'''line := ?([a-z], root)
root := char*
<char> := [a-zA-Z0-9-]
'''
gives:
(1, [('root', 1, 4, [])], 0)
that is, it matches, consuming zero characters. The look ahead has
ignored the results of its first sub element and added its second
sub-element ("root")'s results to the result list.
declaration = r'''line := ?[a-z], root
root := char*
<char> := [a-zA-Z0-9-]
'''
gives:
(1, [('root', 0, 4, [])], 4)
This version simply says that the first character of the production
"root" must be a lowercase letter. It will match the first four
characters of the test data, declaring that to be a regular match of "root".
Hope that helps,
Mike
yvan wrote:
>Hi,
>
>I must be missing something here. I was expecting the following script
>to get me: 'root', it outputs 'oot'
>Reading the docs, my understanding what that if '?' is a prefix, then
>the parser object returns to the previous location. Is that right? If
>not, how do i get that behaviour?
>
>
>
>declaration = r'''line := ?([a-z], root)
>root:= char
>'''
>
>testdata = '''root - Dir
>'''
>from simpleparse import generator
>
>from mx.TextTools import TextTools
>
>parser = generator.buildParser(declaration).parserbyname('line')
>
>taglist = TextTools.tag(testdata, parser)
>
>for tag, beg, end, parts in taglist[1]:
>
> print testdata[beg:end]
>
>Any help appreciated,
>
>-Yvan
>
>
--
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
More information about the Python-list
mailing list