[Tutor] extracting phrases and their memberships from syntax
Emad Nawfal (عماد نوفل)
emadnawfal at gmail.com
Sat Feb 14 15:59:08 CET 2009
On Fri, Feb 13, 2009 at 10:20 AM, Paul McGuire <ptmcg at austin.rr.com> wrote:
> Pyparsing has a built-in helper called nestedExpr that fits neatly in with
> this data. Here is the whole script:
> from pyparsing import nestedExpr
> syntax_tree = nestedExpr()
> results = syntax_tree.parseString(st_data)
> from pprint import pprint
> pprint(results.asList())
> Prints:
> [[['S',
> ['NP-SBJ-1',
> ['NP', ['NNP', 'Rudolph'], ['NNP', 'Agnew']],
> [',', ','],
> ['UCP',
> ['ADJP', ['NP', ['CD', '55'], ['NNS', 'years']], ['JJ', 'old']],
> ['CC', 'and'],
> ['NP',
> ['NP', ['JJ', 'former'], ['NN', 'chairman']],
> ['PP',
> ['IN', 'of'],
> ['NP',
> ['NNP', 'Consolidated'],
> ['NNP', 'Gold'],
> ['NNP', 'Fields'],
> ['NNP', 'PLC']]]]],
> [',', ',']],
> ['VP',
> ['VBD', 'was'],
> ['VP',
> ['VBN', 'named'],
> ['S',
> ['NP-SBJ', ['-NONE-', '*-1']],
> ['NP-PRD',
> ['NP', ['DT', 'a'], ['JJ', 'nonexecutive'], ['NN', 'director']],
> ['PP',
> ['IN', 'of'],
> ['NP',
> ['DT', 'this'],
> ['JJ', 'British'],
> ['JJ', 'industrial'],
> ['NN', 'conglomerate']]]]]]],
> ['.', '.']]]]
> If you want to delve deeper into this, you could, since the content of the
> () groups is so regular. You in essence reconstruct nestedExpr in your own
> code, but you do get some increased control and visibility to the parsed
> content.
> Since this is a recursive syntax, you will need to use pyparsing's
> mechanism
> for recursion, which is the Forward class. Forward is sort of a "I can't
> define the whole thing yet, just create a placeholder" placeholder.
> syntax_element = Forward()
> LPAR,RPAR = map(Suppress,"()")
> syntax_tree = LPAR + syntax_element + RPAR
> Now in your example, a syntax_element can be one of 4 things:
> - a punctuation mark, twice
> - a syntax marker followed by one or more syntax_trees
> - a syntax marker followed by a word
> - a syntax tree
> Here is how I define those:
> marker = oneOf("VBD ADJP VBN JJ DT PP NN UCP NP-PRD "
> "IN NP-SBJ S")
> punc = oneOf(", . ! ?")
> wordchars = printables.replace("(","").replace(")","")
> syntax_element << (
> punc + punc |
> marker + OneOrMore(Group(syntax_tree)) |
> marker + Word(wordchars) |
> syntax_tree )
> Note that we use '<<' operator to "inject" the definition of a
> syntax_element - we can't use '=' or we would get a different expression
> than the one we used to define syntax_tree.
> Now parse the string, and voila! Same as before.
> Here is the entire script:
> from pyparsing import nestedExpr, Suppress, oneOf, Forward, OneOrMore,
> Word,
> printables, Group
> syntax_element = Forward()
> LPAR,RPAR = map(Suppress,"()")
> syntax_tree = LPAR + syntax_element + RPAR
> marker = oneOf("VBD ADJP VBN JJ DT PP NN UCP NP-PRD "
> "IN NP-SBJ S")
> punc = oneOf(", . ! ?")
> wordchars = printables.replace("(","").replace(")","")
> syntax_element << (
> punc + punc |
> marker + OneOrMore(Group(syntax_tree)) |
> marker + Word(wordchars) |
> syntax_tree )
> results = syntax_tree.parseString(st_data)
> from pprint import pprint
> pprint(results.asList())
> -- Paul
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
Thank you so much Paul, Kent, and Hoftkamp.
I was asking what the right tools were, and I got two fully-functional
scripts back. Much more than I had expected.
I'm planning to use these scripts instead of the Perl one. I've also started
with PyParsing as it seems to be a little easier to understand than PLY.
Thank you again,
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
"No victim has ever been more repressed and alienated than the truth"
Emad Soliman Nawfal
Indiana University, Bloomington
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090214/67dc4cb9/attachment.htm>
More information about the Tutor
mailing list