[spambayes-dev] Training options on Configuration Page

G. Armour Van Horn vanhorn at whidbey.com
Mon Sep 22 16:35:49 EDT 2003


Greetings:

On the Configuration page of the proxy, there is this paragraph which I
found to be unclear:

     Suppress caching of bulk ham: Where message caching is
     enabled, this option suppresses caching of messages which are
     classified as ham and marked as 'Precedence: bulk' or
     'Precedence: list'. If you subscribe to a high-volume mailing
     list then your 'Review messages' page can be overwhelmed with
     list messages, making training a pain. Once you've trained
     Spambayes on enough list traffic, you can use this option to
     prevent that traffic showing up in 'Review messages'.

It's clear now, but when I was setting up the upgrade from 1.0a4 to
1.0a5 I missed it, or at least one of its implications. I was really
surprised that one of my filters was seeing a very low quantity of Ham,
and was trending toward a 20:1 imbalance before I started manually
Discarding all Spam on that system.

Maybe the pattern at our house is unusual, but I suspect that for most
users the majority of their ham will be from lists they've joined. I'd
like to suggest that the following set of buttons replace the current
"Cache messages" and "Suppress caching of bulk ham" section:

     1A Train only on Unsure, hide Ham and Spam on Review page
     1B Train only on Unsure, show Ham and Spam on Review page
     1C   1B plus default to Discard for Ham
     1D   1B plus default to Discard for Spam
     2A Train on all messages except Unsure
     2B Train on all messages except Unsure, hide List Ham
     2C Train on all messages except Unsure and List Ham, hide List
     Ham

2C is the effect of the current system with both caching options turned
on. It leads to huge spam imbalances, at least on the system my wife
filters through. (Mine doesn't, because I have a huge volume of admin
ham.) If I had that array of choices, I would recommend 2A is the
default for a new database, 1A as the default once there were adequate
trained messages, and 1C and 1D would be used for a short period to
address any imbalance. The other options are only to cover all
possibilities, and might not have any real-world application. (1B is the
current default with "Suppress caching of list ham" off.)

Also, from the user perspective, nobody cares at all about caching. To
the user these are Training Options, I would recommend the section be
renamed, the typical user options (the array above) should go first, and
the details about the files that support the training process follow the
actual controls over training.

Van



--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted at whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030922/9166b2a2/attachment-0001.html


More information about the spambayes-dev mailing list