practical limits of urlopen()

Steve Holden steve at
Sat Jan 24 18:50:28 CET 2009

webcomm wrote:
> Hi,
> Am I going to have problems if I use urlopen() in a loop to get data
> from 3000+ URLs?  There will be about 2KB of data on average at each
> URL.  I will probably run the script about twice per day.  Data from
> each URL will be saved to my database.
> I'm asking because I've never opened that many URLs before in a loop.
> I'm just wondering if it will be particularly taxing for my server.
> Is it very uncommon to get data from so many URLs in a script?  I
> guess search spiders do it, so I should be able to as well?
You shouldn't expect problem - though you might want to think about
using some more advanced technique like threading to get your results
more quickly.

This is Python, though. It shouldn't take long to write a test program
to verify that you can indeed spider 3,000 pages this way.

With about 2KB per page, you could probably build up a memory structure
containing the whole content of every page without memory usage becoming
too excessive for modern systems. If you are writing stuff out to a
database as you go and not retaining page content then there should be
no problems whatsoever.

Then look at a parallelized solution of some sort if you need it to work
more quickly.

Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC    

More information about the Python-list mailing list