[Spambayes-checkins] website faq.txt,1.54,1.55
Tony Meyer
anadelonbrin at users.sourceforge.net
Tue Dec 30 23:07:39 EST 2003
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1:/tmp/cvs-serv2286
Modified Files:
faq.txt
Log Message:
General tidy-up and bring a few things more up-to-date.
Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.54
retrieving revision 1.55
diff -C2 -d -r1.54 -r1.55
*** faq.txt 11 Dec 2003 02:56:07 -0000 1.54
--- faq.txt 31 Dec 2003 04:07:36 -0000 1.55
***************
*** 41,50 ****
for good mail). It's best to train on recent email, because your interests
and the nature of what spam looks like change over time. Once you've
! collected a fair portion of each (anything is better than nothing, but it
! helps to have a couple hundred of each), you can tell SpamBayes, "Here's my
ham and my spam". It will then process that mail and save information about
different patterns which appear in ham and spam. That information is then
! used during the filtering stage. See the "Command-line training" section
! below for details.
When SpamBayes filters your email, it compares each unclassified message
--- 41,48 ----
for good mail). It's best to train on recent email, because your interests
and the nature of what spam looks like change over time. Once you've
! collected a fair portion of each, you can tell SpamBayes, "Here's my
ham and my spam". It will then process that mail and save information about
different patterns which appear in ham and spam. That information is then
! used during the filtering stage.
When SpamBayes filters your email, it compares each unclassified message
***************
*** 72,80 ****
details.
! * Donate money to the Python Software Foundations. For more
information, including why you would want to donate to the PSF,
please see our `donations page`_.
! * Investigate some of the commercial programs based on the SpamBayes code.
This should give you some additional benefits like support or greater
ease-of-use.
--- 70,78 ----
details.
! * Donate money to the `Python Software Foundation`_. For more
information, including why you would want to donate to the PSF,
please see our `donations page`_.
! * Investigate some of the commercial `programs based on the SpamBayes code`_.
This should give you some additional benefits like support or greater
ease-of-use.
***************
*** 82,86 ****
--- 80,86 ----
.. _the PSA license: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/LICENSE.txt
.. _I'm not a programmer but still want to help: #i-m-not-a-programmer-but-want-to-help-out-what-can-i-do
+ .. _Python Software Foundation: http://www.python.org/psf/
.. _donations page: donations.html
+ .. _programs based on the SpamBayes code: related.html
What online resources are available?
***************
*** 103,107 ****
5. The `SpamBayes bugs list`_ receives copies of all the bug, patch,
! support requests and feature request reports that are submitted via the
`sourceforge`_ system. This is generally only of interest to developers
(you can use the sourceforge system to monitor any individual bugs that
--- 103,107 ----
5. The `SpamBayes bugs list`_ receives copies of all the bug, patch,
! support request and feature request reports that are submitted via the
`sourceforge`_ system. This is generally only of interest to developers
(you can use the sourceforge system to monitor any individual bugs that
***************
*** 114,120 ****
::
! site:mail.python.org pop3proxy -checkins
! would search for messages which mention pop3proxy but exclude checkin
messages.
--- 114,120 ----
::
! site:mail.python.org sb_server -checkins
! would search for messages which mention sb_server but exclude checkin
messages.
***************
*** 130,138 ****
------------------------------------
! Unless you are using the Outlook plugin, you must have a recent version of
Python installed on your computer, version 2.2 or later. (Don't ask about
backporting it to earlier versions of Python. It's almost a certainty this
won't happen.) If you need to install Python on your system, check the
! `Python download page`_ for the version appropriate to your computer You
also need version 2.4.3 or above of the Python "email" package. If you're
running Python 2.2.2 or above, then you already have this. If not, you can
--- 130,142 ----
------------------------------------
! Unless you want to run from the source code, all you need is the
! SpamBayes installer. At present, unless you want to use the Outlook
! plug-in, you must run from source. This will change in the near future.
!
! If you want to run from source, you must have a recent version of
Python installed on your computer, version 2.2 or later. (Don't ask about
backporting it to earlier versions of Python. It's almost a certainty this
won't happen.) If you need to install Python on your system, check the
! `Python download page`_ for the version appropriate to your computer. You
also need version 2.4.3 or above of the Python "email" package. If you're
running Python 2.2.2 or above, then you already have this. If not, you can
***************
*** 158,163 ****
give it messages, tell it whether those messages are ham or spam, and it
adjusts its probabilities accordingly. How to train it is covered below.
! By default it lives in a file called "hammie.db" or (for the Outlook
! plugin) "default_bayes_database".
2. The tokenizer/classifier. This is the core engine of the system. The
--- 162,167 ----
give it messages, tell it whether those messages are ham or spam, and it
adjusts its probabilities accordingly. How to train it is covered below.
! By default it lives in a file called "hammie.db", "statistics_database.db"
! or (for the Outlook plugin) "default_bayes_database".
2. The tokenizer/classifier. This is the core engine of the system. The
***************
*** 231,238 ****
the web. You can upload emails to it for training or classification,
query the probabilities database ("How many valid emails *really* contain
! the word Viagra") find particular messages, and most importantly, train
it on the emails you've received. When you start using the system,
! unless you train it using the Hammie script it will classify most things
! as Unsure, and often make mistakes. But it keeps copies of all the
emails it's seen, and through the web interface you can train it by going
through a list of all the emails you've received and checking a Ham/Spam
--- 235,242 ----
the web. You can upload emails to it for training or classification,
query the probabilities database ("How many valid emails *really* contain
! the word Viagra?") find particular messages, and most importantly, train
it on the emails you've received. When you start using the system,
! (unless you train it with an existing collection) it will classify most
! things as Unsure, and often make mistakes. But it keeps copies of all the
emails it's seen, and through the web interface you can train it by going
through a list of all the emails you've received and checking a Ham/Spam
***************
*** 243,253 ****
it's very quick and easy.
! 6. The Outlook plug-in. For Outlook 2000 and Outlook XP (2002) users (not
Outlook Express) this lets you manage the whole thing from within
! Outlook. You set up a Ham folder and a Spam folder, and train it simply
! by dragging messages into those folders. Alternatively there are buttons
! to do the same thing. And it integrates into Outlook's filtering system
! to make it easy to file all the suspected spam into its own folder, for
! instance.
7. The filter script. This does three jobs: command-line training, procmail
--- 247,257 ----
it's very quick and easy.
! 6. The Outlook plug-in. For Outlook (2000, 2002 (XP), or 2003) users (not
Outlook Express) this lets you manage the whole thing from within
! Outlook. You tell the plug-in which folders to watch for new mail, and
! where to put messages it is unsure about, or considers spam, and it takes
! care of everything else for you. It also has a nice graphical interface
! for training, or you can set it up to train any messages you move into
! particular folders.
7. The filter script. This does three jobs: command-line training, procmail
***************
*** 353,360 ****
Users limited to POP3/IMAP communications to the server can use the POP3_ or
! IMAP_ proxies which are part of the SpamBayes source.
! .. _POP3: http://spambayes.sf.net/applications.html#sb_server
! .. _IMAP: http://spambayes.sf.net/applications.html#imap
--- 357,364 ----
Users limited to POP3/IMAP communications to the server can use the POP3_ or
! proxy or IMAP_ filter which are part of the SpamBayes source.
! .. _POP3: applications.html#sb_server
! .. _IMAP: applications.html#imap
***************
*** 435,447 ****
-----------------------------------------------------
! Previous versions of the binary had a number of problems with various
! versions of Outlook/Windows. However, to our knowledge, the current version
! should work with any combination of Windows/Outlook versions. Please let us
! know if this is not the case. The `troubleshooting guide`_ for the Outlook
! plugin contains the most up-to-date help for working around known problems.
! A number of people have used the plugin with a beta version of Outlook 2003.
! If you fall into that category, note that you must have applied all the
! technical refreshes released by Microsoft to use the plugin successfully.
! Better yet, upgrade to the final version now that it's available.
.. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html
--- 439,446 ----
-----------------------------------------------------
! To our knowledge, the current version of the plug-in should work with any
! version of Windows and Outlook 2000 or above. The `troubleshooting guide`_
! for the Outlook plugin contains the most up-to-date help for working around
! known problems.
.. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html
***************
*** 825,829 ****
web interface and the Outlook plug-in let you view the clues that make
up the message. If you still can't figure out the reason why, you can
! ask the mailing list for advice.
--- 824,829 ----
web interface and the Outlook plug-in let you view the clues that make
up the message. If you still can't figure out the reason why, you can
! ask the mailing list for advice - but make sure you include the spam
! clues/tokens listing in your message!
***************
*** 852,861 ****
They should not be close together (say, 0.4 and 0.6).
! 2. Have you trained on a reasonable number of hams and spams? You should
train on 10 to 20 of each to start with just to get a decent base. After
that, you should be able to train on just mistakes and messages
classified as unsure.
! 3. Check to be sure you haven't made any classification mistakes (trained
spams as hams or vice versa). If so, you could really confuse things and
should move incorrectly classified messages to their correct locations
--- 852,864 ----
They should not be close together (say, 0.4 and 0.6).
! 2. It is quite important that you have trained on roughly equal numbers of
! ham and spam (don't go above a 2::1 ratio, for example).
!
! 3. Have you trained on a reasonable number of hams and spams? You should
train on 10 to 20 of each to start with just to get a decent base. After
that, you should be able to train on just mistakes and messages
classified as unsure.
! 4. Check to be sure you haven't made any classification mistakes (trained
spams as hams or vice versa). If so, you could really confuse things and
should move incorrectly classified messages to their correct locations
***************
*** 867,871 ****
---------------------------------------------------------
! Because training from scratch is a very rare occurrence, and because
deleting all your training information is something you don't want to do by
accident, there isn't an option for this. However, you can quite simply do
--- 870,877 ----
---------------------------------------------------------
! If you're using the Outlook plug-in, you can simply use the "Training"
! tab of the SpamBayes Manager, and tick the "Rebuild entire database" box.
!
! Otherwise, because training from scratch is a very rare occurrence, and as
deleting all your training information is something you don't want to do by
accident, there isn't an option for this. However, you can quite simply do
More information about the Spambayes-checkins
mailing list