downloading a link with javascript in it..

7stud bbxx789_05ss at yahoo.com
Mon May 12 19:07:10 EDT 2008


On May 12, 4:59 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
> On May 12, 1:54 pm, Jetus <stevegi... at gmail.com> wrote:
>
> > I am able to download this page (enclosed code), but I then want to
> > download a pdf file that I can view in a regular browser by clicking
> > on the "view" link. I don't know how to automate this next part of my
> > script. It seems like it uses Javascript.
> > The line in the page source says
>
> > href="javascript:openimagewin('JCCOGetImage.jsp?
> > refnum=DN2007036179');" tabindex=-1>
>
> 1) Use BeautifulSoup to extract the path:
>
> JCCOGetImage.jsp?refnum=DN2007036179
>
> from the html page.
>

BeautifulSoup will allow you to locate and extract the href attribute:

javascript:openimagewin('JCCOGetImage.jsp?refnum=DN2007036179');

See: "The attributes of Tags" in the BS docs.

Then you can use string functions(preferable) or a regex to get
everything between the parentheses(remove the quotes around the path,
too)



More information about the Python-list mailing list