[Tutor] extracting phrases and their memberships from syntax
Emad Nawfal (عماد نوفل)
emadnawfal at gmail.com
Sat Feb 14 15:59:08 CET 2009
On Fri, Feb 13, 2009 at 10:20 AM, Paul McGuire <ptmcg at austin.rr.com> wrote:
> Pyparsing has a built-in helper called nestedExpr that fits neatly in with
> this data. Here is the whole script:
>
> from pyparsing import nestedExpr
>
> syntax_tree = nestedExpr()
> results = syntax_tree.parseString(st_data)
>
> from pprint import pprint
> pprint(results.asList())
>
>
> Prints:
>
> [[['S',
> ['NP-SBJ-1',
> ['NP', ['NNP', 'Rudolph'], ['NNP', 'Agnew']],
> [',', ','],
> ['UCP',
> ['ADJP', ['NP', ['CD', '55'], ['NNS', 'years']], ['JJ', 'old']],
> ['CC', 'and'],
> ['NP',
> ['NP', ['JJ', 'former'], ['NN', 'chairman']],
> ['PP',
> ['IN', 'of'],
> ['NP',
> ['NNP', 'Consolidated'],
> ['NNP', 'Gold'],
> ['NNP', 'Fields'],
> ['NNP', 'PLC']]]]],
> [',', ',']],
> ['VP',
> ['VBD', 'was'],
> ['VP',
> ['VBN', 'named'],
> ['S',
> ['NP-SBJ', ['-NONE-', '*-1']],
> ['NP-PRD',
> ['NP', ['DT', 'a'], ['JJ', 'nonexecutive'], ['NN', 'director']],
> ['PP',
> ['IN', 'of'],
> ['NP',
> ['DT', 'this'],
> ['JJ', 'British'],
> ['JJ', 'industrial'],
> ['NN', 'conglomerate']]]]]]],
> ['.', '.']]]]
>
> If you want to delve deeper into this, you could, since the content of the
> () groups is so regular. You in essence reconstruct nestedExpr in your own
> code, but you do get some increased control and visibility to the parsed
> content.
>
> Since this is a recursive syntax, you will need to use pyparsing's
> mechanism
> for recursion, which is the Forward class. Forward is sort of a "I can't
> define the whole thing yet, just create a placeholder" placeholder.
>
> syntax_element = Forward()
> LPAR,RPAR = map(Suppress,"()")
> syntax_tree = LPAR + syntax_element + RPAR
>
> Now in your example, a syntax_element can be one of 4 things:
> - a punctuation mark, twice
> - a syntax marker followed by one or more syntax_trees
> - a syntax marker followed by a word
> - a syntax tree
>
> Here is how I define those:
>
> marker = oneOf("VBD ADJP VBN JJ DT PP NN UCP NP-PRD "
> "NP NNS NNP CC NP-SBJ-1 CD VP -NONE- "
> "IN NP-SBJ S")
> punc = oneOf(", . ! ?")
>
> wordchars = printables.replace("(","").replace(")","")
>
> syntax_element << (
> punc + punc |
> marker + OneOrMore(Group(syntax_tree)) |
> marker + Word(wordchars) |
> syntax_tree )
>
> Note that we use '<<' operator to "inject" the definition of a
> syntax_element - we can't use '=' or we would get a different expression
> than the one we used to define syntax_tree.
>
> Now parse the string, and voila! Same as before.
>
> Here is the entire script:
>
> from pyparsing import nestedExpr, Suppress, oneOf, Forward, OneOrMore,
> Word,
> printables, Group
>
> syntax_element = Forward()
> LPAR,RPAR = map(Suppress,"()")
> syntax_tree = LPAR + syntax_element + RPAR
>
> marker = oneOf("VBD ADJP VBN JJ DT PP NN UCP NP-PRD "
> "NP NNS NNP CC NP-SBJ-1 CD VP -NONE- "
> "IN NP-SBJ S")
> punc = oneOf(", . ! ?")
>
> wordchars = printables.replace("(","").replace(")","")
>
> syntax_element << (
> punc + punc |
> marker + OneOrMore(Group(syntax_tree)) |
> marker + Word(wordchars) |
> syntax_tree )
>
> results = syntax_tree.parseString(st_data)
> from pprint import pprint
> pprint(results.asList())
>
> -- Paul
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
Thank you so much Paul, Kent, and Hoftkamp.
I was asking what the right tools were, and I got two fully-functional
scripts back. Much more than I had expected.
I'm planning to use these scripts instead of the Perl one. I've also started
with PyParsing as it seems to be a little easier to understand than PLY.
Thank you again,
--
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"
Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090214/67dc4cb9/attachment.htm>
More information about the Tutor
mailing list