Async Client with 1K connections?
williamichang at hotmail.com
Fri Feb 13 09:33:12 CET 2004
Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
> "William Chang" <williamichang at hotmail.com> writes:
> > ... Throughput per PC would be on
> > the order of 1MB/s assuming 200x5KB downloads/sec using 1-2000
> > simultaneous connections. (That's 17M pages per day per PC.)
> That's orders of magnitude less than you-know-who.
Do you know how frequently you-know-who refreshes its entire index? A year
ago things were pretty dire, easily over 10% dead links, if I recall correctly.
10 PCs at 17M/day each will refresh 3B pages in 18 days, easily world-class.
> ... Also, don't forget
> how many queries you have to take from users, and the amount of disk seeks
> needed for each one.
Sure, that's what I do. However, spidering and querying are independent tasks,
> 10 MB of internet connectivity is at least a few K$/month all by itself.
Yes, $2500 to be specific.
There's no reason to be intimidated (if I may use that word) by you-know-who's
marketing message (80,000 machines). Back in '96 Infoseek could handle 10M
queries per day on a single Sun E4000 with 8CPU (<200Mhz), 4GB, 20x4GB RAID.
Sure the WWW is much bigger now, but so are the disk drives!
More information about the Python-list