Difficulties POSTing to RDP Hierarchy Browse Page

Alan Kennedy alanmk at hotmail.com
Fri Dec 3 18:48:32 CET 2004

[Chris Lasher]
 >   I'm trying to write a tool to scrape through some of the Ribosomal
 > Database Project II's (http://rdp.cme.msu.edu/) pages, specifically,
 > through the Hierarchy Browser. (http://rdp.cme.msu.edu/hierarchy/)

I'm sure that urllib is the right tool to use. However, there may be one 
or two problems with the way you're using it.

 > --------excerpted HTML----------------

<!-- snip -->

 > <form name="hierarchyForm" method="POST"
 > action="HierarchyControllerServlet/start/">
 > <input type='hidden' name='printParams' value='no' />

This is an omission from the params you are passing to the 
HierarchyServlet. Although the "printParams" field is not visible to you 
in a browser, the browser still submits a name/value pair in its form 
submission. So you should also in your code, as shwon below.

 > <input id="bergeys" name="taxonomy" type="radio" value="rdpHome" checked>

Also, you are using the wrong value for the taxonomy field. You are 
setting a value of "bergeys", which is the ID of the field, not its 
value. The correct value is "rdpHome".

 > --------Python test code---------------
 > #!/usr/bin/python
 > import urllib
 > options = [("strain", "type"), ("source", "both"),
 >            ("size", "gt1200"), ("taxonomy", "bergeys"),
 >            ("browse", "Browse")]

Try this

options = [ ("printParams", "no"), ("strain", "type"),
             ("source", "both"), ("size", "gt1200"),
             ("taxonomy", "rdpHome"), ("browse", "Browse"),]

 > params = urllib.urlencode(options)
 > rdpbrowsepage = urllib.urlopen(
 >     "http://rdp.cme.msu.edu/hierarchy/HierarchyControllerServlet/start",
 >     params)
 > pagehtml = rdpbrowsepage.read()
 > print pagehtml
 > ---------end Python test code----------


alan kennedy
email alan:              http://xhaus.com/contact/alan

More information about the Python-list mailing list