[Tutor] How to?

Moshe Zadka Moshe Zadka <moshez@math.huji.ac.il>
Wed, 17 May 2000 23:32:34 +0300 (IDT)


On Wed, 17 May 2000, Robin B. Lake wrote:

> I've RTFM, but can't quite see how to do this in Python.  Could do it
> in UNIX sed, but can't get the Python twist to it yet.
> 
> I have a file of HTML, with lines like:
> <TR ALIGN=left VALIGN=top><TD><A HREF="RMP_DTL$FAC.QueryView?P_FACILITY_ID=14206">14206</A><TD>1000 0015 0034<TD>ALASKA NITROGEN PRODUCS LLC<TD>KENAI<TD>AK<TD>25-JUN-1999
> 
> I want to extract the value following the FACILITY_ID= section of the string,
> that is 14206 for this line.  I'll then use that to create a
> urllib.openurl command with params = urllib.urlencode({'P_FACILITY_ID':that-value})
> and get the next lower level of information.
> 
> Thanks for any help or suggestions.

Have you specifically read the htmllib and urllib modules' documentation?
And the "urlparse" module?

The steps involved are:

1. Use htmllib to get the anchor list
2. Use urlparse to break it into parts
3. Use urllib to get the page