[Python-Dev] The first trustworthy <wink> GBayes results

Tim Peters tim.one@comcast.net
Mon, 02 Sep 2002 04:09:54 -0400


[Delaney, Timothy]
> Pretty darned good advice too ... but you won't object if I waste
> some time playing with this stuff anyway I hope. Only one way to
accumulate
> experience after all ;)

Not at all!  Knock yourself out -- it's really a lot of fun, except when it
gets so tedious you start punching the wall just to watch your knuckles
bleed <wink>.

> Personally, I considered that you were already well past the point of
> diminishing returns,

Not yet -- false positives are a horrible thing, and the false negative rate
still lets a lot of spam through.  Cutting the f-n rate, e.g., in half,
would mean half as much spam to deal with; generalization left to the
reader.

> and anything further was of academic interest to those who felt a desire
to
> tinker ...

The best hope for reducing f-n lies in exploiting more header lines than I
can test with my mixed corpora, and there's *tons* of room for improvement
there (note that the f-n rate is more than 20x greater than the f-p rate
now).  Anyone who wants to tackle that with tedious experiment should first
pick Neil Schemenauer's brain:  he had a good start on that early last week.

> (i.e. the hard work has been done, and everything else is just fun and
> games :) If enough people (or just one dedicated person) waste enough
time,
> who knows what may come out. Hey - it worked for timsort didn't it ...? ;)

Indeed so, and it works for this too -- never underestimate the power of
working yourself sick.  If you also *write* about it, you can make everyone
else ill too by proxy <wink>.

sharing-the-pain-ly y'rs  - tim