buzzard at urubu.freeserve.co.uk
Mon Sep 1 16:06:32 CEST 2003
"Arthur" <ajsiegel at optonline.net> wrote in message
news:mailman.1062394443.2068.python-list at python.org...
> >Maybe there was some notice about using Python in
> >geophysic and the symposium book in one journal, so there was a sudden
> >spat of, say, three people who bought both.
> You would think the parameter for a statistically significant sample size
> would be a fundamental concept in this kind of thing. And no action taken
> before one was determined to exist.
Statistical tests take sample sizes into account (so e.g. a larger effect
will tend to be statistically significant for a smaller sample size).
Sample size calcs. are more useful when you're in a position to determine
how large the sample will be.
> OTOH, the concept of "coincidence" must necessarily be ruled out in AI, I
> would think.
Coincidence can't generally be ruled out, but you can look for relationships
in the (sample) data that would be unlikely to be present if the same
relationships weren't also present in the population.
> *Our* intelligence seems to give us a read as to where on the bell curve a
> particular event may lie, or a least some sense of when we are at an
> on the curve. Which we call coincidence. AI would probably have a
> particularly difficult time with this concept - it seems to me.
Some people have a difficult time with (or are unaware of) "statistical
thinking". Maybe some of them are involved in AI? (Well, of course some of
them are. :-))
> Spam filtering software must need to tackle these kinds of issues.
It can do, and I've no doubt some of it does. Spam filtering is a
classification problem and can be handled in a variety of ways. It's
generally easy to come up with an overly complex set of rules / model that
will correctly classify sample data. But (as you know) the idea's to come
up with a set of rules / model that will correctly (as far as possible)
classify future data. As many spam filters use Bayesian methods, I would
guess that they might be fitted using Bayesian methods; in which case overly
complex models can be (at least partially) avoided through the choice of
prior, rather than significance testing.
What do Amazon use? My guess (unless it's something really naive) would be
More information about the Python-list