warning for google api users

Gabriel B. gabriel.barros at gmail.com
Tue Feb 21 14:11:45 EST 2006


the google webservices (aka google API) is not even close for any kind
of real use yet

if you search for the same term 10 times, you get 3 mixed totals. 2
mixed result order. and one or two "502 bad gateway"

i did an extensive match agains the API and the regular search
service. the most average set of results:

results 1-10; total: 373000
results 11-20; total: 151000
results 21-30; total: 151000
results 31-40; total: 373000
results 41-50; total: 373000
results 51-60; total: 373000
results 61-70; total: 151000
( 502 bad gateway. retry)
results 71-80; total: 373000
results 81-90; total: 151000
( 502 bad gateway. retry)
results 91-100; total: 373000

on the regular google search, total:  2,050,000 (for every page, of
course)

besides that, the first and third result on the regular google search,
does not apear in the 100 results from the API in this query, but this
is not average, more like 1 chance in 10 :-/

So, no matter how much google insists that this parrot is sleeping,
it's simply dead.


now, what i presume that is happening, is that they have a dozen of
machine pools, and each one has a broken snapshot of the production
index (probably they have some process to import the index and or it
explode in some point or they simply kill it after some time). and
they obviously don't run that process very often.

Now... anyone has some implementation of pygoogle.py that scraps the
regular html service instead of using SOAP? :)

Gabriel B.



More information about the Python-list mailing list