practical limits of urlopen()
Lie Ryan
lie.1296 at gmail.com
Tue Jan 27 05:37:59 EST 2009
On Sat, 24 Jan 2009 09:17:10 -0800, webcomm wrote:
> Hi,
>
> Am I going to have problems if I use urlopen() in a loop to get data
> from 3000+ URLs? There will be about 2KB of data on average at each
> URL. I will probably run the script about twice per day. Data from
> each URL will be saved to my database.
>
> I'm asking because I've never opened that many URLs before in a loop.
> I'm just wondering if it will be particularly taxing for my server. Is
> it very uncommon to get data from so many URLs in a script? I guess
> search spiders do it, so I should be able to as well?
urllib doesn't have any limits, what might limit your program is your
connection speed and the hardware where the server and downloader is on.
Getting 3000 URLs is about 6MBs, a piece of cake for a sufficiently
modern machine on a decent internet connection (the real calculation
isn't that simple though, there is also some cost associated with sending
and processing HTML headers).
Google indexes millions of pages per day, but they also have one of the
most advanced server farm in the world.
More information about the Python-list
mailing list