Re[2]: [Tutor] Is it possible to get a list of files from a remote host?

Andrew Andrew <angelopoulos@csi.com>
Sun, 6 Oct 2002 14:46:12 -0400


I would think that if there are a bunch of "knowns" here... you know
the page, and know that it is a directory listing, know the format
and know what you are looking for a simple solution would be to get
and parse that page, id those strings/elements that you are
looking for (re module or find?) and then do something else with them
(retrieve them?). Know any HTML?

Assuming, like Magnus said, the Web server normally provides publicly
the information you are looking for (and he's right, if it is the
whole site you are looking for then websucker is kind of fun to play
with and  poke around in--at least this newbie is enjoying it).

- -- Andrew
mailto:angelopoulos@csi.com


- --
Sunday, October 6, 2002, 2:03:16 PM, you wrote:


ML> At 12:29 2002-10-06 -0400, Jmllr891@cs.com wrote:
>>I just learned how to download files from a remote host using the
>>urllib module. Now, is it at all possible to get a list of files
>>that are on that remote host? Say, using the glob module?

ML> Hi,

ML> There is no magic here. Basically, you can reach just
ML> the same information with a program (python or something
ML> else) as you reach with your browser (although the browser
ML> does hide some details).

ML> Didn't you follow my advice about the urlretrieve? ;) What
ML> you *did* fetch in the code where you tried to fetch the
ML> python logo was just what you ask for now: A directory
ML> listing. Type http://www.python.org/pics/ and see for
ML> yourself.

ML> You get this as HTML and you won't get it unless the web
ML> server is set up to give directory listings and lack a
ML> default file (index.html etc) in that directory.

ML> You aren't supposed to be able to roam around the
ML> filesystem at will on all the web servers on the net.

ML> If you want to mirror / download a whole web site, you will
ML> have to write a program that follows all local hyperlinks
ML> recursively, and hope that gets you all the pages you need.

ML> I suggest you take a look in the Tools/webchecker directory of
your
ML> python installation. Try running websucker.py. Maybe it's
something
ML> like that you are looking for?