ranking and searching [was: Re: "Zope-certified Python Engineers"]

John J. Lee jjl at pobox.com
Sun Mar 24 12:26:10 EST 2002


On Fri, 22 Mar 2002, Laura Creighton wrote:

> >On Thu, 21 Mar 2002, John J. Lee wrote:
[...]
> Now consider the case where site number 15,899 is your best bet.  Unless
> you can come up with the sort of search terms that makes this site come
> up higher -- you are never going to see it.  This is a neat problem.

But of course, you can get other sites to the top of the list by using
longer / different search terms.  And there is no search system which will
not suffer from bad ranking, though metadata does help of course.

[...]
> once you have made it to the top, you basically stay there, so an
[...]

I agree this is a problem, but given the fact that self-consistent ranking
works much better than non self-consistent ranking, for everyday usage,
this is inevitable, isn't it?

> The deeper problem is one of the web itself.  A few months ago, I
[...story of inaccurate web site...]
>
> I can't report this; I can't get it fixed; and right now some other
> poor soul may have done exactly what I did.  This is worrysome  We

You can not link to it, though.  I don't claim this is the perfect system,
of course, but the self-consistent algorithm used by google makes it work
better than you'd expect.  I agree that further improvements probably need
better input rather than (only) better algorithms.

It is a shame, as amk commented, that backlinking schemes aren't being
used much (never used it myself, so I can hardly complain).

> now have too much information at our disposal, vastly too much, when
> throughout history we were more likely to have too little.  The problem
> now isn't finding stuff -- it is knowing how trustable the stuff is.
>
> EBay has one sort of approach.  So does Advogato (see http://advogato.org ).
> It is a real problem and one that interests me a lot.

Of course, this is the same kind of information that Google already uses,
though in a self-consistent and decentralised way (which is necessary for
web page ranking -- unlike advogato's system).  Not exactly the same,
agreed, since linking to a page doesn't necessarily mean you think it's
authoritative.

> >[OT: A question I've asked many times and got no answer to is 'why has
> >nobody done self-consistent ranking for academic papers'?  I fear the
[...]
> Oh yes! Many people have been trying to do this.  See
> http://www.isinet.com/isi/news/2001/productnews/8117876/index.html
> for one attempt.

Hmm, I don't see anything about it on that page... Perhaps my use of
'self-consistent' is not clear: I mean this in the same kind of sense as
in 'self-consistent field' calculations (which IIRC is where they say they
got their inspiration) -- eg. http://www.google.com/technology/ :

| In essence, Google interprets a link from page A to page B as a vote,
| by page A, for page B. But, Google looks at more than the sheer volume
| of votes, or links a page receives; it also analyzes the page that
| casts the vote. Votes cast by pages that are themselves "important"
| weigh more heavily and help to make other pages "important."

Which I've always assumed is the reason why Google does so much better
than older search engines.


John




More information about the Python-list mailing list