[Tutor] How to?
Moshe Zadka
Moshe Zadka <moshez@math.huji.ac.il>
Wed, 17 May 2000 23:32:34 +0300 (IDT)
On Wed, 17 May 2000, Robin B. Lake wrote:
> I've RTFM, but can't quite see how to do this in Python. Could do it
> in UNIX sed, but can't get the Python twist to it yet.
>
> I have a file of HTML, with lines like:
> <TR ALIGN=left VALIGN=top><TD><A HREF="RMP_DTL$FAC.QueryView?P_FACILITY_ID=14206">14206</A><TD>1000 0015 0034<TD>ALASKA NITROGEN PRODUCS LLC<TD>KENAI<TD>AK<TD>25-JUN-1999
>
> I want to extract the value following the FACILITY_ID= section of the string,
> that is 14206 for this line. I'll then use that to create a
> urllib.openurl command with params = urllib.urlencode({'P_FACILITY_ID':that-value})
> and get the next lower level of information.
>
> Thanks for any help or suggestions.
Have you specifically read the htmllib and urllib modules' documentation?
And the "urlparse" module?
The steps involved are:
1. Use htmllib to get the anchor list
2. Use urlparse to break it into parts
3. Use urllib to get the page