warning for google api users

Doug Bromley doug.bromley at gmail.com
Tue Feb 21 14:27:55 EST 2006


Producing a SERPS scraper for Google would be very easy and possible in
about 10-15 lines of code.  However, its against the Google terms of service
and if they decide to bite you for breaching them then you'll be in
trouble.  Its also a reason you'll not likely find one that trumpets its
existence very much as the site promoting it would probably be taken off the
Google index - severely effecting visitors.

On 2/21/06, Gabriel B. <gabriel.barros at gmail.com> wrote:
>
> the google webservices (aka google API) is not even close for any kind
> of real use yet
>
> if you search for the same term 10 times, you get 3 mixed totals. 2
> mixed result order. and one or two "502 bad gateway"
>
> i did an extensive match agains the API and the regular search
> service. the most average set of results:
>
> results 1-10; total: 373000
> results 11-20; total: 151000
> results 21-30; total: 151000
> results 31-40; total: 373000
> results 41-50; total: 373000
> results 51-60; total: 373000
> results 61-70; total: 151000
> ( 502 bad gateway. retry)
> results 71-80; total: 373000
> results 81-90; total: 151000
> ( 502 bad gateway. retry)
> results 91-100; total: 373000
>
> on the regular google search, total:  2,050,000 (for every page, of
> course)
>
> besides that, the first and third result on the regular google search,
> does not apear in the 100 results from the API in this query, but this
> is not average, more like 1 chance in 10 :-/
>
> So, no matter how much google insists that this parrot is sleeping,
> it's simply dead.
>
>
> now, what i presume that is happening, is that they have a dozen of
> machine pools, and each one has a broken snapshot of the production
> index (probably they have some process to import the index and or it
> explode in some point or they simply kill it after some time). and
> they obviously don't run that process very often.
>
> Now... anyone has some implementation of pygoogle.py that scraps the
> regular html service instead of using SOAP? :)
>
> Gabriel B.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060221/0223760e/attachment.html>


More information about the Python-list mailing list