[Tutor] extracting phrases and their memberships from syntax trees

Kent Johnson kent37 at tds.net
Fri Feb 13 14:18:31 CET 2009


On Thu, Feb 12, 2009 at 6:20 PM, Emad Nawfal (عماد نوفل)
<emadnawfal at gmail.com> wrote:
> Dear Tutors,
> I have syntax trees like the one below. I need to extract the membership
> information like which adjective belongs to which noun phrase, and so on. In
> short, I want to do something like this:
> http://ilk.uvt.nl/team/sabine/chunklink/README.html
> I already have the Perl script that does that, so I do not need a script. I
> just want to be able to do this myself. My question is: what tools do I need
> for this? Could you please give me pointers to where to start? I'll then try
> to do it myself, and ask questions when I get stuck.

I guess I'm in the mood for writing parsers this week :-)

Attached is a parser that uses PLY to parse the structure you
provided. The result is a nested list structure with exactly the same
structure as the original; it builds the list you would get by
replacing all the () with [] and quoting the strings.

Also in the attachment is a function that walks the resulting tree to
print a table similar to the one in the chunklink reference.

This is a pretty simple example of both PLY usage and recursive
walking of a tree, if anyone wants to learn about either one. I hope I
haven't taken all your fun :-)

Kent

PS to the list: I know, I'm setting a bad example. Usually we like to
teach people to write Python, not write their programs for them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ParsePennTree.py
Type: text/x-python
Size: 2489 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/tutor/attachments/20090213/1291a6e1/attachment.py>


More information about the Tutor mailing list