[Spambayes] rebuild database?
Coe, Bob
rcoe at CambridgeMA.GOV
Wed Oct 12 13:41:30 CEST 2005
Careful. If I read the user's initial message correctly, what she calls
a false negative most of us would call a false positive, i.e. a ham
message identified as spam or potential spam. As Kenny points out, a few
false negatives are a common annoyance. But false positives can be a
more serious problem, since their presence forces you to slog through
rivers of spam looking for good messages you might otherwise miss.
Bob
> -----Original Message-----
> From: spambayes-bounces at python.org
> [mailto:spambayes-bounces at python.org] On Behalf Of Kenny Pitt
> Sent: Tuesday, October 11, 2005 10:08 AM
> To: mgleich
> Cc: spambayes at python.org
> Subject: Re: [Spambayes] rebuild database?
>
>
> On 10/10/05, mgleich <mgleich at nyc.rr.com> wrote:
> > I've just realized that although my database is 536kb and that is
not
> > so large, it is composed of 702 spam and 110 ham. I gather this is
> > extremely unbalanced and may explain why I'm getting false
negatives.
>
> Actually, 7 to 1 is really not an unusually high imbalance.
> We've seen reports from people who have 100 to 1 or higher imbalances.
>
> If you are getting false positives then imbalance is the most
> common cause. A few false negatives are not uncommon, though,
> because spam is constantly changing. If a relatively high
> percentage of your spam is coming in as false negatives, then
> you might have an imbalance problem. The best way to tell for
> sure is to see the spam clues for one of the false negatives,
> which you can generate from the SpamBayes menu.
>
> > Do I need to begin from scratch? If so, do I just delete the db
file
> > and will Spambayes just create a new one?
>
> For a 7 to 1 imbalance, I would usually say there is no need
> to begin from scratch. However, SpamBayes learns quickly so
> it shouldn't hurt to start over and see what happens. Since
> you know the size of your DB, you've obviously located the
> file. You will probably see two files with the *.db
> extension, one is the training data and the other contains
> information about the messages that have been processed. Just
> close Outlook, delete these 2 files, then restart Outlook and
> SpamBayes should recreate the databases.
More information about the SpamBayes
mailing list