[Catalog-sig] Please turn off ratings

Wed Apr 6 01:25:06 CEST 2011

In a message of Wed, 06 Apr 2011 00:06:35 +0200, "Martin v. Löwis" writes:

<snip>

>- it's not helpful to users
>  I believe this is not true

This is the one I am interested in changing your opinion about.  I think that
'I want a 5 star rating system so I can tell what package to use without
going to the effort of evaluating the system' fits right in there with
'I want microbenchmarks to tell me how fast my program will run' and
'I want taking a glass of Orange Juice every morning to prevent me from
getting colds'.  They all are a form of 'I want a magic bullet' where there
*is* no magic bullet.  If you want to find out information about a 
package, you need to either do the research, or read detailed comments
from people who have done the research, and whose opinions you trust.

Crowdsourcing does not cut it here.  Anybody can rate something, and
since as you say "why does he go through the effort, then? It must be 
helpful, even if just to vent frustration, or to share joy" you are
tacitly admitting that what the rating system best measures is people's
emotional reaction to your software.  But this is a spectacularly poor
thing to use as a measure of whether you should use somebody's software
or not.  It totally misses the difficult questions.

And indeed, when you looked for a poorly rated package, and game up
with python-cjson, it wasn't just the rating that you looked at, but the
comments as well which helped you decide that this was a package you 
would want to stay away from.  Without that information, you too couldn't
tell what the rating meant.

I was talking with the people at Yelp, the ratings company in San Francisco
this March.  They are crowdsourcing experts.  And they say that there is
a particular problem with people giving ratings when they don't also come
with a review.  People really do go around blasting other people's
businesses because they think it will make their competing business prosper.
People really do get drunk and decide that trashing the reputations of
famous whatevers would be a really cool thing to do.  People rate
things poorly, because they were feeling bad at the time, for reasons
that have nothing to do with the product or service.  Indeed yelp has
a whole department of people whose complete job is going back and correcting
the ratings of people and places, by weeding out those complaints which
really were only about people expressing their emotions.  People using
a rating system as an emotional dumping ground is a real problem for their
business.  And, among other things that they told me, the smaller your
sample set, the worse this problem is for the rating organisation.

Now, of course, Yelp has to work very hard at this, because their business
model depends on people trusting the ratings that they get from Yelp
when they want to purchase a good or service.  So quality control for the
rating system, to make sure that it is as fair as they can make it, is
something they really can spend time and money at.

We don't have those kinds of resources, and we don't have the kinds of 
numbers of reviews where a policy of 'cross your fingers and hope that
the large number of fair, unbiased votes will drown out the unfair
votes' can be expected to work, even in the short term.  This is because
the whole voting system is a self-selcted set, and the voters are
disproportionately going to be the people who want to express frustration
and who just like to cause trouble by giving out negative votes for the
fun of annoying people, or promoting their competing package, or whatever.

Thus we have a situation here where we are in more desparate need of
a way to check that the ratings are fair, no resources to do so, and,
given than people give votes without needing to state a reason or
an explanation for their number, no way to check that the rating is
fair.  All we have done is undermined the credibility of 
http://pypi.python.org/pypi  - yes there is a rating system, but there
is no quality control of the raters, so its all pretty meaningless.

So no magic bullet for the people who hoped that a rating system would
let them pick software.  But they will undoubtably try to use the ratings
for exactly that, regardless, demonstrating that hope still springs
eternal in the human soul.  But maybe we have a responsibility to these
people, to not give them what they asked for, because it really won't
do what they want?

It's undoubtablly fun for the people who want an emotional dumping
ground, but I for one am not interested in using the catalog for these
people's emotional needs.  Writing software is hard enough without
adding the requirement -- oh, yes, and if you ever release this thing,
you have to put up with having people vent vent their frustrations
with you, publicly, permanently, where there is nothing you can do
about it.  If you complain you get told to ignore the ratings, which it
tantamount to admitting that they are meaningless.  I don't want
this sort of vulnerability, and I am pretty thick-skinned.  It must
be excruciating hell for those developers who are more senstive to
public opinion than I am.

So I think that the rating system is a serious disservice to the people
it was supposed to help, the users who asked for it, as well as being
a source of considerable angst for the software developers.

Laura