Get directory from http web site
Kent Johnson
kent37 at tds.net
Sat Aug 6 16:45:55 EDT 2005
rock69 wrote:
> Hi all :)
>
> I was wondering if there's some neat and easy way to get the entire
> contents of a directory at a specific web url address.
>
> I have the following link:
>
> http://www.infomedia.it/immagini/riviste/covers/cp
>
> and as you can see it's just a list containing all the files (images)
> that I need. Is it possible to retrieve this list (not the physical
> files) and have it stored in a variable of type list or something?
BeautifulSoup and urllib do this easily:
>>> from BeautifulSoup import BeautifulSoup
>>> import urllib
>>> data = urllib.urlopen('http://www.infomedia.it/immagini/riviste/covers/cp/').read()
>>> soup = BeautifulSoup(data)
>>> anchors = soup.fetch('a')
>>> len(anchors)
164
>>> for a in anchors[:10]:
... print a['href'], a.string
...
?N=D Name
?M=A Last modified
?S=A Size
?D=A Description
/immagini/riviste/covers/ Parent Directory
cp100.jpg cp100.jpg
cp100sm.jpg cp100sm.jpg
cp101.jpg cp101.jpg
cp101sm.jpg cp101sm.jpg
cp102.jpg cp102.jpg
http://www.crummy.com/software/BeautifulSoup/
Kent
More information about the Python-list
mailing list