downloading a link with javascript in it..
7stud
bbxx789_05ss at yahoo.com
Mon May 12 18:59:01 EDT 2008
On May 12, 1:54 pm, Jetus <stevegi... at gmail.com> wrote:
> I am able to download this page (enclosed code), but I then want to
> download a pdf file that I can view in a regular browser by clicking
> on the "view" link. I don't know how to automate this next part of my
> script. It seems like it uses Javascript.
> The line in the page source says
>
> href="javascript:openimagewin('JCCOGetImage.jsp?
> refnum=DN2007036179');" tabindex=-1>
>
1) Use BeautifulSoup to extract the path:
JCCOGetImage.jsp?refnum=DN2007036179
from the html page.
2) The path is relative to the current url, so if the current url is:
http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp
Then the url to the page you want is:
http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179
You can use urlparse.urljoin() to join a relative path to the current
url:
import urlparse
base_url = 'http://www.landrecords.jcc.ky.gov/records/S3DataLKUP.jsp'
relative_url = 'JCCOGetImage.jsp?refnum=DN2007036179'
target_url = urlparse.urljoin(base_url, relative_url)
print target_url
--output:--
http://www.landrecords.jcc.ky.gov/records/JCCOGetImage.jsp?refnum=DN2007036179
3) Python has a webbrowser module that allows you to open urls in a
browser:
import webbrowser
webbrowser.open("www.google.com")
You could also use system() or os.startfile()[Windows], to do the same
thing:
os.system(r'C:\"Program Files"\"Mozilla Firefox"\firefox.exe')
#You don't have to worry about directory names
#with spaces in them if you use startfile():
os.startfile(r'C:\Program Files\Mozilla Firefox\firefox.exe')
All the urls you posted give me errors when I try to open them in a
browser, so you will have to sort out those problems first.
More information about the Python-list
mailing list