buzzard at urubu.freeserve.co.uk
Mon Sep 1 21:16:24 CEST 2003
"Dennis Lee Bieber" <wlfraed at ix.netcom.com> wrote in message
news:hrdc21-ce3.ln1 at beastie.ix.netcom.com...
> Duncan Smith fed this fish to the penguins on Monday 01 September 2003
> 07:06 am:
> > It can do, and I've no doubt some of it does. Spam filtering is a
> > classification problem and can be handled in a variety of ways. It's
> > generally easy to come up with an overly complex set of rules / model
> > that
> > will correctly classify sample data. But (as you know) the idea's to
> > come up with a set of rules / model that will correctly (as far as
> > possible)
> > classify future data. As many spam filters use Bayesian methods, I
> > would guess that they might be fitted using Bayesian methods; in which
> > case overly complex models can be (at least partially) avoided through
> > the choice of prior, rather than significance testing.
> > What do Amazon use? My guess (unless it's something really naive)
> > would be association rules.
> If I may insert an off-the-cuff comment...
> The goal of spam filtering is normally to reduce the amount of
> permitted through to the client.
> However, Amazon's goal would seem to be to increase the potential
> sales. Hence, I'd suspect their algorithm is rigged on a quite
> optimisitic mode (Hey, out of set A and set B, we have an overlap of
> x... maybe we can increase x by suggesting that set A would like the
> stuff from set B...)
Absolutely. I'd be interested to know exactly how they try going about this
(although I don't suppose they'd make it public). I'd guess that for 'very
low' and 'very high' values of x the increase in sales would be less than
for 'middling' values of x.
More information about the Python-list