ranking and searching [was: Re: "Zope-certified Python Engineers"]

Sun Mar 24 12:26:10 EST 2002

On Fri, 22 Mar 2002, Laura Creighton wrote:

> >On Thu, 21 Mar 2002, John J. Lee wrote:
[...]
> Now consider the case where site number 15,899 is your best bet.  Unless
> you can come up with the sort of search terms that makes this site come
> up higher -- you are never going to see it.  This is a neat problem.

But of course, you can get other sites to the top of the list by using
longer / different search terms.  And there is no search system which will
not suffer from bad ranking, though metadata does help of course.

[...]
> once you have made it to the top, you basically stay there, so an
[...]

I agree this is a problem, but given the fact that self-consistent ranking
works much better than non self-consistent ranking, for everyday usage,
this is inevitable, isn't it?

> The deeper problem is one of the web itself.  A few months ago, I
[...story of inaccurate web site...]
>
> I can't report this; I can't get it fixed; and right now some other
> poor soul may have done exactly what I did.  This is worrysome  We

You can not link to it, though.  I don't claim this is the perfect system,
of course, but the self-consistent algorithm used by google makes it work
better than you'd expect.  I agree that further improvements probably need
better input rather than (only) better algorithms.

It is a shame, as amk commented, that backlinking schemes aren't being
used much (never used it myself, so I can hardly complain).

> now have too much information at our disposal, vastly too much, when
> throughout history we were more likely to have too little.  The problem
> now isn't finding stuff -- it is knowing how trustable the stuff is.
>
> EBay has one sort of approach.  So does Advogato (see http://advogato.org ).
> It is a real problem and one that interests me a lot.

Of course, this is the same kind of information that Google already uses,
though in a self-consistent and decentralised way (which is necessary for
web page ranking -- unlike advogato's system).  Not exactly the same,
agreed, since linking to a page doesn't necessarily mean you think it's
authoritative.

> >[OT: A question I've asked many times and got no answer to is 'why has
> >nobody done self-consistent ranking for academic papers'?  I fear the
[...]
> Oh yes! Many people have been trying to do this.  See
> http://www.isinet.com/isi/news/2001/productnews/8117876/index.html
> for one attempt.

Hmm, I don't see anything about it on that page... Perhaps my use of
'self-consistent' is not clear: I mean this in the same kind of sense as
in 'self-consistent field' calculations (which IIRC is where they say they
got their inspiration) -- eg. http://www.google.com/technology/ :

| In essence, Google interprets a link from page A to page B as a vote,
| by page A, for page B. But, Google looks at more than the sheer volume
| of votes, or links a page receives; it also analyzes the page that
| casts the vote. Votes cast by pages that are themselves "important"
| weigh more heavily and help to make other pages "important."

Which I've always assumed is the reason why Google does so much better
than older search engines.

John