[Tutor] Remote Directory Reading

Kent Johnson kent37 at tds.net
Fri Aug 26 13:33:00 CEST 2005


Daniel Watkins wrote:
> I've run into a bit of trouble with my spider script. Thus far, it is
> able to retrieve all of the data off the website that is contained
> within standard HTML, downloading jpg, gif and bmp images that are
> related to the files (this restriction only being set by a lack of
> further definitions by myself). However, I have run into a problem with
> one link that simply points to a folder (www.aehof.org.uk/forum) within
> which is contained a phpBB forum.

It seems to me this page is no different from any other - it has a bunch of links that you can follow to get the content. I'm not sure why you want to handle it specially? Except maybe to ignore some of the links, which you will have to write into your program.
 
> I've attempted to use 'dircache' but couldn't find a way for it to
> understand web addresses. However, I may not have hit upon the right
> combination of syntax, so may be mistaken. I also considered 'os' but it
> appears to require definition of a particular operating system, which is
> a direction I'd prefer not to take unless I have to. In addition, the
> error messages I received from using 'dircache' traced back into 'os' so
> it is unlikely it would have been suitable for the purpose.

The os module actually hides the differences between operating systems pretty well. It has implementations for many os's but the interface you see is os-independent. The choice of the correct implementation happens under the hood, it is not something you need to be concerned with.

Kent



More information about the Tutor mailing list