Fastest way to retrieve and write html contents to file
DFS
nospam at dfs.com
Mon May 2 21:51:31 EDT 2016
On 5/2/2016 3:19 AM, Chris Angelico wrote:
> There's an easier way to test if there's caching happening. Just crank
> the iterations up from 10 to 100 and see what happens to the times. If
> your numbers are perfectly fair, they should be perfectly linear in
> the iteration count; eg a 1.8 second ten-iteration loop should become
> an 18 second hundred-iteration loop. Obviously they won't be exactly
> that, but I would expect them to be reasonably close (eg 17-19
> seconds, but not 2 seconds).
100 loops
Finished VBScript in 3.953 seconds
Finished VBScript in 3.608 seconds
Finished VBScript in 3.610 seconds
Bit of a per-loop speedup going from 10 to 100.
> Then the next thing to test would be to create a deliberately-slow web
> server, and connect to that. Put a two-second delay into it, to
> simulate a distant or overloaded server, and see if your logs show the
> correct result. Something like this:
>
> --------
>
> import time
> try:
> import http.server as BaseHTTPServer # Python 3
> except ImportError:
> import BaseHTTPServer # Python 2
>
> class SlowHTTP(BaseHTTPServer.BaseHTTPRequestHandler):
> def do_GET(self):
> self.send_response(200)
> self.send_header("Content-type","text/html")
> self.end_headers()
> self.wfile.write(b"Hello, ")
> time.sleep(2)
> self.wfile.write(b"world!")
>
> server = BaseHTTPServer.HTTPServer(("", 1234), SlowHTTP)
> server.serve_forever()
>
> -------
>
> Test that with a web browser or command-line downloader (go to
> http://127.0.0.1:1234/), and make sure that (a) it produces "Hello,
> world!", and (b) it takes two seconds. Then set your test scripts to
> downloading that URL. (Be sure to set them back to low iteration
> counts first!) If the times are true and fair, they should all come
> out pretty much the same - ten iterations, twenty seconds. And since
> all that's changed is the server, this will be an accurate
> demonstration of what happens in the real world: network requests
> aren't always fast. Incidentally, you can also watch the server's log
> to see if it's getting the appropriate number of requests.
>
> It may turn out that changing the web server actually materially
> changes your numbers. Comment out the sleep call and try it again -
> you might find that your numbers come closer together, because this
> naive server doesn't send back 204 NOT MODIFIED responses or anything.
> Again, though, this would prove that you're not actually measuring
> language performance, because the tests are more dependent on the
> server than the client.
>
> Even if the files themselves aren't being cached, you might find that
> DNS is. So if you truly want to eliminate variables, replace the name
> in your URL with an IP address. It's another thing that might mess
> with your timings, without actually being a language feature.
>
> Networking has about four billion variables in it. You're messing with
> one of the least significant: the programming language :)
>
> ChrisA
Thanks for the good feedback.
More information about the Python-list
mailing list