SimpleParse and LookAhead

Mike C. Fletcher mcfletch at rogers.com
Tue Apr 1 00:49:46 EST 2003


Hi Yvan,

Here's the code I assume you intended to send as the problem-description:
8<______________
declaration = r'''line := ?([a-z], root)
root := char*
<char> := [a-zA-Z0-9-]
'''

testData = '''root - Dir
'''

from simpleparse import generator

from mx.TextTools import TextTools

parser = generator.buildParser(declaration).parserbyname('line')

taglist = TextTools.tag(testData, parser)
print taglist

for tag, beg, end, parts in taglist[1]:
   print testData[beg:end]
8<______________

Now, the problem that you are running into is that you are declaring
that the entire group "([a-z], root)" is a look ahead production.  Look
ahead productions return results exactly equal to the result that would
be returned by the regular group.  Since the [a-z] is consuming the
first character, the production "root" returns values from after that
first character to the end of the match.  There should be no consumption
of characters (i.e. the next position is position 0).

Here are the alternatives, (you haven't really explained what you're
trying to do with the look ahead, so can't come up with anything more
helpful):

   declaration = r'''line := ?([a-z], root)
   root := char*
   <char> := [a-zA-Z0-9-]
   '''
gives:
   (1, [('root', 1, 4, [])], 0)

that is, it matches, consuming zero characters.  The look ahead has
ignored the results of its first sub element and added its second
sub-element ("root")'s results to the result list.

   declaration = r'''line := ?[a-z], root
   root := char*
   <char> := [a-zA-Z0-9-]
   '''
gives:
   (1, [('root', 0, 4, [])], 4)

This version simply says that the first character of the production
"root" must be a lowercase letter. It will match the first four
characters of the test data, declaring that to be a regular match of "root".

Hope that helps,
Mike

yvan wrote:

>Hi, 
>
>I must be missing something here. I was expecting the following script
>to get me: 'root', it outputs 'oot'
>Reading the docs, my understanding what that if '?' is a prefix, then
>the parser object returns to the previous location. Is that right? If
>not, how do i get that behaviour?
>
>
>
>declaration = r'''line := ?([a-z], root)
>root:= char
>'''
>
>testdata = '''root - Dir
>'''
>from simpleparse import generator
>
>from mx.TextTools import TextTools
>
>parser = generator.buildParser(declaration).parserbyname('line')
>
>taglist = TextTools.tag(testdata, parser)
>
>for tag, beg, end, parts in taglist[1]:
>
>    print testdata[beg:end]
>
>Any help appreciated,
>
>-Yvan
>  
>

-- 
_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/








More information about the Python-list mailing list