[Spambayes-checkins] website faq.txt,1.54,1.55

Tony Meyer anadelonbrin at users.sourceforge.net
Tue Dec 30 23:07:39 EST 2003


Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1:/tmp/cvs-serv2286

Modified Files:
	faq.txt 
Log Message:
General tidy-up and bring a few things more up-to-date.

Index: faq.txt
===================================================================
RCS file: /cvsroot/spambayes/website/faq.txt,v
retrieving revision 1.54
retrieving revision 1.55
diff -C2 -d -r1.54 -r1.55
*** faq.txt	11 Dec 2003 02:56:07 -0000	1.54
--- faq.txt	31 Dec 2003 04:07:36 -0000	1.55
***************
*** 41,50 ****
  for good mail).  It's best to train on recent email, because your interests
  and the nature of what spam looks like change over time.  Once you've
! collected a fair portion of each (anything is better than nothing, but it
! helps to have a couple hundred of each), you can tell SpamBayes, "Here's my
  ham and my spam".  It will then process that mail and save information about
  different patterns which appear in ham and spam.  That information is then
! used during the filtering stage.  See the "Command-line training" section
! below for details.
  
  When SpamBayes filters your email, it compares each unclassified message
--- 41,48 ----
  for good mail).  It's best to train on recent email, because your interests
  and the nature of what spam looks like change over time.  Once you've
! collected a fair portion of each, you can tell SpamBayes, "Here's my
  ham and my spam".  It will then process that mail and save information about
  different patterns which appear in ham and spam.  That information is then
! used during the filtering stage.
  
  When SpamBayes filters your email, it compares each unclassified message
***************
*** 72,80 ****
    details.
  
! * Donate money to the Python Software Foundations.  For more 
    information, including why you would want to donate to the PSF,
    please see our `donations page`_.
  
! * Investigate some of the commercial programs based on the SpamBayes code.
    This should give you some additional benefits like support or greater 
    ease-of-use.
--- 70,78 ----
    details.
  
! * Donate money to the `Python Software Foundation`_.  For more 
    information, including why you would want to donate to the PSF,
    please see our `donations page`_.
  
! * Investigate some of the commercial `programs based on the SpamBayes code`_.
    This should give you some additional benefits like support or greater 
    ease-of-use.
***************
*** 82,86 ****
--- 80,86 ----
  .. _the PSA license: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/LICENSE.txt
  .. _I'm not a programmer but still want to help: #i-m-not-a-programmer-but-want-to-help-out-what-can-i-do
+ .. _Python Software Foundation: http://www.python.org/psf/
  .. _donations page: donations.html
+ .. _programs based on the SpamBayes code: related.html
  
  What online resources are available?
***************
*** 103,107 ****
  
  5. The `SpamBayes bugs list`_ receives copies of all the bug, patch,
!    support requests and feature request reports that are submitted via the
     `sourceforge`_ system.  This is generally only of interest to developers
     (you can use the sourceforge system to monitor any individual bugs that
--- 103,107 ----
  
  5. The `SpamBayes bugs list`_ receives copies of all the bug, patch,
!    support request and feature request reports that are submitted via the
     `sourceforge`_ system.  This is generally only of interest to developers
     (you can use the sourceforge system to monitor any individual bugs that
***************
*** 114,120 ****
  ::
  
!     site:mail.python.org pop3proxy -checkins
  
! would search for messages which mention pop3proxy but exclude checkin
  messages.
  
--- 114,120 ----
  ::
  
!     site:mail.python.org sb_server -checkins
  
! would search for messages which mention sb_server but exclude checkin
  messages.
  
***************
*** 130,138 ****
  ------------------------------------
  
! Unless you are using the Outlook plugin, you must have a recent version of
  Python installed on your computer, version 2.2 or later.  (Don't ask about
  backporting it to earlier versions of Python.  It's almost a certainty this
  won't happen.) If you need to install Python on your system, check the
! `Python download page`_ for the version appropriate to your computer You
  also need version 2.4.3 or above of the Python "email" package.  If you're
  running Python 2.2.2 or above, then you already have this.  If not, you can
--- 130,142 ----
  ------------------------------------
  
! Unless you want to run from the source code, all you need is the
! SpamBayes installer.  At present, unless you want to use the Outlook
! plug-in, you must run from source.  This will change in the near future.
! 
! If you want to run from source, you must have a recent version of
  Python installed on your computer, version 2.2 or later.  (Don't ask about
  backporting it to earlier versions of Python.  It's almost a certainty this
  won't happen.) If you need to install Python on your system, check the
! `Python download page`_ for the version appropriate to your computer. You
  also need version 2.4.3 or above of the Python "email" package.  If you're
  running Python 2.2.2 or above, then you already have this.  If not, you can
***************
*** 158,163 ****
     give it messages, tell it whether those messages are ham or spam, and it
     adjusts its probabilities accordingly.  How to train it is covered below.
!    By default it lives in a file called "hammie.db" or (for the Outlook
!    plugin) "default_bayes_database".
  
  2. The tokenizer/classifier.  This is the core engine of the system.  The
--- 162,167 ----
     give it messages, tell it whether those messages are ham or spam, and it
     adjusts its probabilities accordingly.  How to train it is covered below.
!    By default it lives in a file called "hammie.db", "statistics_database.db"
!    or (for the Outlook plugin) "default_bayes_database".
  
  2. The tokenizer/classifier.  This is the core engine of the system.  The
***************
*** 231,238 ****
     the web.  You can upload emails to it for training or classification,
     query the probabilities database ("How many valid emails *really* contain
!    the word Viagra") find particular messages, and most importantly, train
     it on the emails you've received.  When you start using the system,
!    unless you train it using the Hammie script it will classify most things
!    as Unsure, and often make mistakes.  But it keeps copies of all the
     emails it's seen, and through the web interface you can train it by going
     through a list of all the emails you've received and checking a Ham/Spam
--- 235,242 ----
     the web.  You can upload emails to it for training or classification,
     query the probabilities database ("How many valid emails *really* contain
!    the word Viagra?") find particular messages, and most importantly, train
     it on the emails you've received.  When you start using the system,
!    (unless you train it with an existing collection) it will classify most
!    things as Unsure, and often make mistakes.  But it keeps copies of all the
     emails it's seen, and through the web interface you can train it by going
     through a list of all the emails you've received and checking a Ham/Spam
***************
*** 243,253 ****
     it's very quick and easy.
  
! 6. The Outlook plug-in.  For Outlook 2000 and Outlook XP (2002) users (not
     Outlook Express) this lets you manage the whole thing from within
!    Outlook.  You set up a Ham folder and a Spam folder, and train it simply
!    by dragging messages into those folders.  Alternatively there are buttons
!    to do the same thing.  And it integrates into Outlook's filtering system
!    to make it easy to file all the suspected spam into its own folder, for
!    instance.
  
  7. The filter script.  This does three jobs: command-line training, procmail
--- 247,257 ----
     it's very quick and easy.
  
! 6. The Outlook plug-in.  For Outlook (2000, 2002 (XP), or 2003) users (not
     Outlook Express) this lets you manage the whole thing from within
!    Outlook.  You tell the plug-in which folders to watch for new mail, and
!    where to put messages it is unsure about, or considers spam, and it takes
!    care of everything else for you.  It also has a nice graphical interface
!    for training, or you can set it up to train any messages you move into
!    particular folders.
  
  7. The filter script.  This does three jobs: command-line training, procmail
***************
*** 353,360 ****
  
  Users limited to POP3/IMAP communications to the server can use the POP3_ or
! IMAP_ proxies which are part of the SpamBayes source.
  
! .. _POP3: http://spambayes.sf.net/applications.html#sb_server
! .. _IMAP: http://spambayes.sf.net/applications.html#imap
  
  
--- 357,364 ----
  
  Users limited to POP3/IMAP communications to the server can use the POP3_ or
! proxy or IMAP_ filter which are part of the SpamBayes source.
  
! .. _POP3: applications.html#sb_server
! .. _IMAP: applications.html#imap
  
  
***************
*** 435,447 ****
  -----------------------------------------------------
  
! Previous versions of the binary had a number of problems with various
! versions of Outlook/Windows.  However, to our knowledge, the current version
! should work with any combination of Windows/Outlook versions. Please let us
! know if this is not the case.  The `troubleshooting guide`_ for the Outlook
! plugin contains the most up-to-date help for working around known problems.
! A number of people have used the plugin with a beta version of Outlook 2003.
! If you fall into that category, note that you must have applied all the
! technical refreshes released by Microsoft to use the plugin successfully.
! Better yet, upgrade to the final version now that it's available.
  
  .. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html
--- 439,446 ----
  -----------------------------------------------------
  
! To our knowledge, the current version of the plug-in should work with any
! version of Windows and Outlook 2000 or above. The `troubleshooting guide`_
! for the Outlook plugin contains the most up-to-date help for working around
! known problems.
  
  .. _troubleshooting guide: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/Outlook2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html
***************
*** 825,829 ****
  web interface and the Outlook plug-in let you view the clues that make
  up the message.  If you still can't figure out the reason why, you can
! ask the mailing list for advice.
  
  
--- 824,829 ----
  web interface and the Outlook plug-in let you view the clues that make
  up the message.  If you still can't figure out the reason why, you can
! ask the mailing list for advice - but make sure you include the spam
! clues/tokens listing in your message!
  
  
***************
*** 852,861 ****
     They should not be close together (say, 0.4 and 0.6).
  
! 2. Have you trained on a reasonable number of hams and spams?  You should
     train on 10 to 20 of each to start with just to get a decent base.  After
     that, you should be able to train on just mistakes and messages
     classified as unsure.
  
! 3. Check to be sure you haven't made any classification mistakes (trained
     spams as hams or vice versa).  If so, you could really confuse things and
     should move incorrectly classified messages to their correct locations
--- 852,864 ----
     They should not be close together (say, 0.4 and 0.6).
  
! 2. It is quite important that you have trained on roughly equal numbers of
!    ham and spam (don't go above a 2::1 ratio, for example).
! 
! 3. Have you trained on a reasonable number of hams and spams?  You should
     train on 10 to 20 of each to start with just to get a decent base.  After
     that, you should be able to train on just mistakes and messages
     classified as unsure.
  
! 4. Check to be sure you haven't made any classification mistakes (trained
     spams as hams or vice versa).  If so, you could really confuse things and
     should move incorrectly classified messages to their correct locations
***************
*** 867,871 ****
  ---------------------------------------------------------
  
! Because training from scratch is a very rare occurrence, and because
  deleting all your training information is something you don't want to do by
  accident, there isn't an option for this.  However, you can quite simply do
--- 870,877 ----
  ---------------------------------------------------------
  
! If you're using the Outlook plug-in, you can simply use the "Training"
! tab of the SpamBayes Manager, and tick the "Rebuild entire database" box.
! 
! Otherwise, because training from scratch is a very rare occurrence, and as
  deleting all your training information is something you don't want to do by
  accident, there isn't an option for this.  However, you can quite simply do





More information about the Spambayes-checkins mailing list