Urllib's urlopen and urlretrieve
davea at davea.name
Fri Feb 22 18:05:30 CET 2013
On 02/22/2013 12:09 AM, qoresucks at gmail.com wrote:
> Initially I was just trying the html, but later when I attempted more complicated sites that weren't my own I noticed that large bulks of the site were lost in the process. The urllib code essentially looks like what I was trying but it didn't work as I had expected.
> To be more specific, after I got it working for my own little page, I attempted to take it further and get all the lessons from Learn Python The Hard Way. When I tried the same method on the first intro page to see if I was even getting it right, the html code was all there but upon opening it I noticed the format was all wrong, colors were off for the background, images, etc... were all missing.
So how are you opening this html? In a text editor that somehow added
colors? Or were you opening it in a browser? In order for a browser to
render a non-trivial page, it may need lots of files other than the
html. Colors for example can be specified inline, in the header, or in
an external css file. If the page was designed to use the external css,
and it's missing or not in the right location, then the browser is going
to get the colors wrong.
Further, if the location (url) is relative, then you can create a
similar directory structure, and the browser will find it. But if it's
absolute, then the browser is going to try to go out to the web to fetch
it. If it succeeds, then it's masking the fact that you haven't
downloaded the "whole web site."
The same is true for other external refs. It may be impossible to host
it elsewhere if there are any absolute urls.
More information about the Python-list