I started with a minimal database, each time reclassifying my unsure
folder. I found out that with less than 15 ham + 16 spam, it doesn't
work good enough. Note that the spam I receive is very monotonous,
because my ISP replaces viruses with text messages, and since that's
almost all the spam I receive, 4 spams are enough to get all spam with a
probability op more dan 90%. However, with 15 hams, some ham scores
above 20%. And because I don't want to unbalance the database, I trained
on already-correctly-classified spams as well as the most highly

With 15 ham, 16 spam, 1.5% of the incoming e-mail is classified as
unsure, all ham, with scores ranging from .108 to .290, so a ham_cutoff
of 30% would solve it all.


