downloading a link with javascript in it..

7stud bbxx789_05ss at yahoo.com
Mon May 12 18:59:01 EDT 2008


On May 12, 1:54 pm, Jetus <stevegi... at gmail.com> wrote:
> I am able to download this page (enclosed code), but I then want to
> download a pdf file that I can view in a regular browser by clicking
> on the "view" link. I don't know how to automate this next part of my
> script. It seems like it uses Javascript.
> The line in the page source says
>
> href="javascript:openimagewin('JCCOGetImage.jsp?
> refnum=DN2007036179');" tabindex=-1>
>

1) Use BeautifulSoup to extract the path:

JCCOGetImage.jsp?refnum=DN2007036179

from the html page.


2) The path is relative to the current url, so if the current url is:

http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp

Then the url to the page you want is:

http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179

You can use urlparse.urljoin() to join a relative path to the current
url:


import urlparse

base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'

target_url = urlparse.urljoin(base_url, relative_url)
print target_url

--output:--
http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179



3) Python has a webbrowser module that allows you to open urls in a
browser:

import webbrowser

webbrowser.open("www.google.com")


You could also use system() or os.startfile()[Windows], to do the same
thing:

os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')

#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')


All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.






More information about the Python-list mailing list