[Spambayes] feature request

Tim Peters tim.one at comcast.net
Tue Dec 9 23:31:01 EST 2003

[Seth Goodman]
> How do you feel about automatically rescoring the Unsures after any
> training event?

I'd feel good about that.  But nothing is ever simple:  if we did that,
people will divide into four additional camps:  those who want newly
determined ham and spam to be moved by magic out of Unsure as a result of
auto-rescoring; those who want neither moved by magic as a result of that;
and two camps for those who want only a particular one of them moved by
magic.  I happen to fall in the second camp (I wouldn't want anything
magically moved as a result of auto-rescoring), but it's not a
right-versus-wrong issue.

> Most people probably don't have that many Unsures stored up and it would
> be helpful.  Again, I'm just one user and I don't know how others use the
> program.

You've been reading this list long enough to know that no two people seem to
use it exactly the same way, right?

> I understand your dilemma with the large inboxes.  It's certainly
> your call, but I hope you recognize that many (most?) users don't
> have 10K messages waiting for reply.

Certainly!  OTOH, I see almost no spam in my inbox anyway, so I wouldn't
want to wait for a measly 200 messages to get rescored either (perceptible
cost but no perceptible benefit).  Every knob probably drives another 50% of
potential users of a feature away.  We've already got so many knobs in the
UI it's a miracle anyone is still left here <wink>.

> That's a burden I can hardly imagine,

Na, it's easy:  there are only 10K msgs waiting for replies because I keep
archiving away the ones so far down the stack that it takes Outlook 5
minutes to scroll that far <wink>.

> so I do really appreciate your developing this open source code.

Oh, this project didn't intend to target personal email.  That was an
afterthought, and it's why so much of the early heavy testing turned out not
to be particularly relevant to most people here.  The original purpose was
to filter high-volume mailing lists, as a possible addin to GNU Mailman (the
mailing list software that runs *this* list, for example).

So early testing was done against databases trained on tens of thousands of
ham and spam, sliced and diced randomly, and well balanced by construction.
It turns out nobody uses it that way, but that's still what it was designed

There was a start toward testing strategies for real-life low-volume
personal use, but that fizzled out around the time my employer yanked me off
this project (they paid my salary for the initial development and testing,
which is why it got done -- not really the salary part, but that I was able
to spend major time on it then).  That appears to have different
characteristics than the high-volume mailing list use.  I've been surprised
it works as well as it does for as many users as it does.  I'm not surprised
it works as well as it does for me, not because I warped it toward my own
email (to the contrary, I never tested it on my own email), but because most
of my email *comes* from tech mailing lists, and that's what it was
developed against.

> Personally, I don't get 10K messages that need reply in a
> year,

I said they're waiting for replies, not that they're going to get one --
maybe  one per 100 will.  I'd *like* to reply to all, but that's physically
impossible; I can't even acknowledge them all.

> but maybe I'm not typical and I don't develop software, so different
> world.  Since hardware is expected to be bug-free in the first proto
> board (and yes, there is a tooth fairy), not too many people find bugs.

I'm an exception, but I worked for computer manufacturers for 15 years, and
finding CPU and FPU bugs was part of my job.  Well, not an *intended* part
of my job <heh>.

> But when one does get out of the lab, they are sometimes, uh, irritated.
> When this occurs, I do get a message or two that day, or perhaps an
> avalanche.  They are remarkably similar, usually starting with the
> adverbial phrase, "When?", with the remainder being filler.

SpamBayes should do great on those -- repetitive msgs the bulk of which is
filler is pretty much the definition of a tech mailing list <wink>.

More information about the Spambayes mailing list