[Spambayes-checkins] website faq.ht,1.3,1.4
Skip Montanaro
montanaro at users.sourceforge.net
Thu May 22 14:52:17 EDT 2003
Update of /cvsroot/spambayes/website
In directory sc8-pr-cvs1:/tmp/cvs-serv18975
Modified Files:
faq.ht
Log Message:
* run through tidy to get visible nesting
* correct <a> tag "name" attributes (no leading #)
* add new content from Bill Parducci
Index: faq.ht
===================================================================
RCS file: /cvsroot/spambayes/website/faq.ht,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** faq.ht 20 Apr 2003 04:11:02 -0000 1.3
--- faq.ht 22 May 2003 20:52:14 -0000 1.4
***************
*** 4,151 ****
<h2>Frequently Asked Questions</h2>
<ol>
! <li>Development</li>
! <ol>
! <li><a href="#tokentrick">Hey! Why don't you implement cool tokenizer trick X? I think it would really foil those spammers!</a></li>
! <li><a href="#serverside">This software is great! I want to implement it for all my users. Are there plans to develop a server-side spambayes solution?</a></li>
! </ol>
! <li>Using Spambayes</li>
! <ol>
! <li><a href="#unsure">I just got a spam, but the system said it was "unsure". Why couldn't it tell that it was spam - it's obvious?</a></li>
! <li><a href="#stillunsure">OK, I trained on that message. But I just got *another* one, and the stupid system still thinks it's unsure. Why did it ignore me???</a></li>
! <li><a href="#wipetraining">I've mucked up my training and I want to start all over again, but there isn't an option for this anywhere. What do I do?</a></li>
! <li><a href="#configfiles">I can't use a web browser, so I can't configure
! pop3proxy/imapfilter.<br />
! Also: how do I configure hammiefilter and the other applications that
! don't have a user interface?</a></li>
! <li><a href="#optionstoset">That's great, now I know what the format looks
! like, but what options do I need to set?</a></li>
! <li><a href="#configlocation">I've made a configuration file, but Spambayes is
! ignoring it. Now what?</a></li>
! </ol>
</ol>
! <p>If you have any suggestions about other questions and answers that should be included
! here, please mail <a href="mailto:spambayes at python.org">the list</a> with them.</p>
! <h3><a name="#tokentrick">Hey! Why don't you implement cool tokenizer trick
! X? I think it would really foil those spammers!</a></h3>
! <p>Have you run your tokenizer trick against a set of messages to see if
! it actually works? Many times what seems like a good idea turns out
! not to help much, and sometimes even hurts. If you have a good idea,
! you've run it against a batch of messages and can prove that it
! helps, paste the code for your technique and the proof to the mailing
! list. If you're not a coder, but are really keen on your idea, post
! a feature request on the project page, and wait for someone else to
! code it for you (but make sure you do some testing when it's done).
! Otherwise, you will likely get a message from Tim Peters about
! why you need to test your idea :) Note that as a general rule,
! we've found that with the tokenizer, "stupid beats smart" -- that is,
! very specialised tokenizer behaviour usually produces worse results than
! a more general approach that just generates tokens and throws them at the
! classifier.</p>
!
! <h3><a name="#serverside">This software is great! I want to implement it
! for all my users.
! Are there plans to develop a server-side spambayes solution?</a></h3>
! <p>The problem with a server-side solution is that everyone has a
! different idea of what is spam - that's the whole strength of the
! bayesian-style filtering concept. If you are certain that *all*
! of your users would agree on what is spam and what is not, then
! this might work for you, but otherwise you really have to have
! individual databases for each user. Either way, you should be
! able to modify spambayes easily enough to fit into your setup.
! Please let the list know if you do have success in this area, and
! we'll update this answer.</p>
!
! <h3><a name="#unsure">I just got a spam, but the system said it was "unsure".
! Why couldn't it tell that it was spam - it's obvious?</a></h3>
! <p>It may be obvious to you, but the classifier only works on
! the information it has been given. Maybe this is "new" (you've
! never seen this particular flavour of spam before), or maybe
! there aren't enough clues in the message which the system is
! aware of as strong spam clues.</p>
!
! <h3><a name="#stillunsure">OK, I trained on that message. But I just got
! <i>another</i> one, and the stupid system still thinks it's unsure. Why
! did it ignore me???</a></h3>
! <p>It didn't, but you may need to train on a few more of this type
! of message to get it classified as "spam". The classification
! algorithm weights its results based on the number of times it
! has seen a particular clue, so that clues unique to this type
! of message may need a few more instances to become "convincing".</p>
! <h3><a name="#wipetraining">I've mucked up my training and I want to
! start all over again, but there isn't an option for this anywhere.
! What do I do?</a></h3>
! <p>Because training from scratch is a very rare occurance, and because
! deleting all your training information is something you don't want
! to do by accident, there isn't an option for this. However, you
! can quite simply do this manually. All the training data is stored
! in a file, usually called hammie.db, and if you delete (or rename)
! this, then you will start training from scratch. If you are using
! the web interface for the POP3 proxy, the configuration page tells
! you what this file is called (and where it is) down towards the
! bottom of the page.</p>
! <h3><a name="#configfiles">I can't use a web browser, so I can't configure
! pop3proxy/imapfilter.<br />
! Also: how do I configure hammiefilter and the other applications that
! don't have a user interface?</a></h3>
! <p>You need to create a configuration file. This is in the 'standard'
! ini file format (originally created for Windows 3.1, I believe). You
! can find documentation on this format in the Python ConfigParser doc,
! <a href="http://www.python.org/doc/current/lib/module-ConfigParser.html">
! http://www.python.org/doc/current/lib/module-ConfigParser.html</a>, but
! basically, it's just a text file: lines beginning with # are comments,
! sections start with a line like "[Section Name]", and options are set
! out within the appropriate section with lines like "opt = val" or
! "opt: val" (either is ok). Whitespace other than line endings is for
! the most part ignored, so you can make it look like whatever you like.
! You can see a list of what a configuration file of all the defaults
! would like like if you execute the following Python commands:<br />
! <pre>
! >>> from spambayes.Options import options
! >>> print options.display()
! </pre></p>
! <h3><a name="#optionstoset">That's great, now I know what the format looks
! like, but what options do I need to set?</a></h3>
! <p>This depends on exactly what you want to do, and which application you
! are intending to use. The easiest thing is to execute the following
! Python commands:<br />
! <pre>
! >>> from spambayes.Options import options
! >>> print options.display_full()
! </pre>
! This will print out a complete list of the options, including a
! description of the option, and its default value. You can also look up
! a single section, if you know its name:<br />
! <pre>
! >>> print options.display_full("section_name")
! </pre>
! Or just a single option:<br />
! <pre>
! >>> print options.display_full("section_name", "option_name")
! </pre>
! If you want a list of all the sections, you can use this command:<br />
! <pre>
! >>> print options.sections()
! </pre>
! If you want a list of all the options, you can use this command:<br />
! <pre>
! >>> print options.options(prepend_section_name=False)
! </pre></p>
! <h3><a name="#configlocation">I've made a configuration file, but Spambayes is
! ignoring it. Now what?</a></h3>
! <p>Spambayes looks for your configuration file in three places - if it
! can't find it, then, obviously, your options will not be loaded. The
! first place that Spambayes checks is the environment variable
! BAYESCUSTOMIZE. You can set this to the path of your configuration file,
! wherever it is, and it will be loaded. You can also specify more than
! one file, separated by the appropriate path separator for your platform.
! This is the recommended method of specifying the location of the file,
! unless you do so via a user interface (as provided by the POP3 proxy,
! the Outlook plugin, and the IMAP filter). If Spambayes doesn't find
! anything in the BAYESCUSTOMIZE variable, then it checks the current
! working directory and your home directory for a bayescustomize.ini or
! .spambayesrc file (respectively).</p>
--- 4,323 ----
<h2>Frequently Asked Questions</h2>
+
<ol>
! <li>
! Development
! </li>
! <li>
! <ol type="a">
! <li>
! <a href="#tokentrick">Hey! Why don't you implement cool
! tokenizer trick X? I think it would really foil those
! spammers!</a>
! </li>
! <li>
! <a href="#serverside">This software is great! I want to
! implement it for all my users. Are there plans to
! develop a server-side spambayes solution?</a>
! </li>
! </ol>
! </li>
! <li>
! Compatibility
! </li>
! <li>
! <ol type="a">
! <li>
! <a href="#outlookversions">What version of Outlook does
! it work with?</a>
! </li>
! <li>
! <a href="#outlookexpress">Does Spambayes work with
! Outlook Express?</a>
! </li>
! <li>
! <a href="#nonoutlook">Forget Outlook, what clients will
! Spambayes work with in general?</a>
! </li>
! </ol>
! </li>
! <li>
! Using Spambayes
! </li>
! <li>
! <ol type="a">
! <li>
! <a href="#unsure">I just got a spam, but the system
! said it was "unsure". Why couldn't it tell that it was
! spam - it's obvious?</a>
! </li>
! <li>
! <a href="#stillunsure">OK, I trained on that message.
! But I just got *another* one, and the stupid system
! still thinks it's unsure. Why did it ignore me?</a>
! </li>
! <li>
! <a href="#wipetraining">I've mucked up my training and
! I want to start all over again, but there isn't an
! option for this anywhere. What do I do?</a>
! </li>
! <li>
! <a href="#configfiles">I can't use a web browser, so I
! can't configure pop3proxy/imapfilter.<br>
! Also: how do I configure hammiefilter and the other
! applications that don't have a user interface?</a>
! </li>
! <li>
! <a href="#optionstoset">That's great, now I know what
! the format looks like, but what options do I need to
! set?</a>
! </li>
! <li>
! <a href="#configlocation">I've made a configuration
! file, but Spambayes is ignoring it. Now what?</a>
! </li>
! </ol>
! </li>
</ol>
! <p>
! If you have any suggestions about other questions and answers
! that should be included here, please mail <a href=
! "mailto:spambayes at python.org">the list</a> with them.
! </p>
! <h3>
! <a name="tokentrick">Hey! Why don't you implement cool
! tokenizer trick X? I think it would really foil those
! spammers!</a>
! </h3>
! <p>
! Have you run your tokenizer trick against a set of messages
! to see if it actually works? Many times what seems like a
! good idea turns out not to help much, and sometimes even
! hurts. If you have a good idea, you've run it against a batch
! of messages and can prove that it helps, paste the code for
! your technique and the proof to the mailing list. If you're
! not a coder, but are really keen on your idea, post a feature
! request on the project page, and wait for someone else to
! code it for you (but make sure you do some testing when it's
! done). Otherwise, you will likely get a message from Tim
! Peters about why you need to test your idea :) Note that as a
! general rule, we've found that with the tokenizer, "stupid
! beats smart" -- that is, very specialised tokenizer behaviour
! usually produces worse results than a more general approach
! that just generates tokens and throws them at the classifier.
! </p>
! <h3>
! <a name="serverside">This software is great! I want to
! implement it for all my users. Are there plans to develop a
! server-side spambayes solution?</a>
! </h3>
! <p>
! The problem with a server-side solution is that everyone has
! a different idea of what is spam - that's the whole strength
! of the bayesian-style filtering concept. If you are certain
! that *all* of your users would agree on what is spam and what
! is not, then this might work for you, but otherwise you
! really have to have individual databases for each user.
! Either way, you should be able to modify spambayes easily
! enough to fit into your setup. Please let the list know if
! you do have success in this area, and we'll update this
! answer.
! </p>
! <h3>
! <a name="unsure">I just got a spam, but the system said it
! was "unsure". Why couldn't it tell that it was spam - it's
! obvious?</a>
! </h3>
! <p>
! It may be obvious to you, but the classifier only works on
! the information it has been given. Maybe this is "new"
! (you've never seen this particular flavour of spam before),
! or maybe there aren't enough clues in the message which the
! system is aware of as strong spam clues.
! </p>
! <h3>
! <a name="stillunsure">OK, I trained on that message. But I
! just got <i>another</i> one, and the stupid system still
! thinks it's unsure. Why did it ignore me?</a>
! </h3>
! <p>
! It didn't, but you may need to train on a few more of this
! type of message to get it classified as "spam". The
! classification algorithm weights its results based on the
! number of times it has seen a particular clue, so that clues
! unique to this type of message may need a few more instances
! to become "convincing".
! </p>
! <h3>
! <a name="wipetraining">I've mucked up my training and I want
! to start all over again, but there isn't an option for this
! anywhere. What do I do?</a>
! </h3>
! <p>
! Because training from scratch is a very rare occurance, and
! because deleting all your training information is something
! you don't want to do by accident, there isn't an option for
! this. However, you can quite simply do this manually. All the
! training data is stored in a file, usually called hammie.db,
! and if you delete (or rename) this, then you will start
! training from scratch. If you are using the web interface for
! the POP3 proxy, the configuration page tells you what this
! file is called (and where it is) down towards the bottom of
! the page.
! </p>
! <h3>
! <a name="configfiles">I can't use a web browser, so I can't
! configure pop3proxy/imapfilter.<br>
! Also: how do I configure hammiefilter and the other
! applications that don't have a user interface?</a>
! </h3>
! <p>
! You need to create a configuration file. This is in the
! 'standard' ini file format (originally created for Windows
! 3.1, I believe). You can find documentation on this format in
! the Python ConfigParser doc, <a href=
! "http://www.python.org/doc/current/lib/module-ConfigParser.html">
! http://www.python.org/doc/current/lib/module-ConfigParser.html</a>,
! but basically, it's just a text file: lines beginning with #
! are comments, sections start with a line like "[Section
! Name]", and options are set out within the appropriate
! section with lines like "opt = val" or "opt: val" (either is
! ok). Whitespace other than line endings is for the most part
! ignored, so you can make it look like whatever you like. You
! can see a list of what a configuration file of all the
! defaults would like like if you execute the following Python
! commands:
! </p>
! <pre>
! >>> from spambayes.Options import options
! >>> print options.display()
! </pre><br>
! <br>
!
! <h3>
! <a name="optionstoset">That's great, now I know what the
! format looks like, but what options do I need to set?</a>
! </h3>
! <p>
! This depends on exactly what you want to do, and which
! application you are intending to use. The easiest thing is to
! execute the following Python commands:
! </p>
! <pre>
! >>> from spambayes.Options import options
! >>> print options.display_full()
! </pre>
! This will print out a complete list of the options, including
! scription of the option, and its default value. You can also
! up a single section, if you know its name:<br>
!
! <pre>
! >>> print options.display_full("section_name")
! </pre>
! Or just a single option:<br>
!
! <pre>
! >>> print options.display_full("section_name", "option_name")
! </pre>
! If you want a list of all the sections, you can use this
! and:<br>
!
! <pre>
! >>> print options.sections()
! </pre>
! If you want a list of all the options, you can use this
! and:<br>
!
! <pre>
! >>> print options.options(prepend_section_name=False)
! </pre>
! <br>
! <br>
!
! <h3>
! <a name="configlocation">I've made a configuration file, but
! Spambayes is ignoring it. Now what?</a>
! </h3>
! <p>
! Spambayes looks for your configuration file in three places -
! if it can't find it, then, obviously, your options will not
! be loaded. The first place that Spambayes checks is the
! environment variable BAYESCUSTOMIZE. You can set this to the
! path of your configuration file, wherever it is, and it will
! be loaded. You can also specify more than one file, separated
! by the appropriate path separator for your platform. This is
! the recommended method of specifying the location of the
! file, unless you do so via a user interface (as provided by
! the POP3 proxy, the Outlook plugin, and the IMAP filter). If
! Spambayes doesn't find anything in the BAYESCUSTOMIZE
! variable, then it checks the current working directory and
! your home directory for a bayescustomize.ini or .spambayesrc
! file (respectively).
! </p>
! <h3>
! <a name="outlookversions">What version of Outlook does it
! work with?</a>
! </h3>
! <p>
! The most up to date list of known compatible versions of
! Outlook may be found <a href=
! "http://spambayes.sourceforge.net/windows.html">here</a>.
! </p>
! <h3>
! <a name="outlookexpress">Does Spambayes work with Outlook
! Express?</a>
! </h3>
! <p>
! Outlook Express isn't a version of Outlook, it's a completely
! separate program (from the same company). Because they give
! it away for free, OE is a really stripped down program, and
! it's extremely difficult to create a plugin for it.
! </p>
! <p>
! As someone else said, you can use pop3proxy or imapfilter
! (depending on whether you use POP3 or IMAP). Check out the
! INTEGRATION.TXT file for instructions.
! </p>
! <p>
! Pop3proxy/imapfilter aren't quite as 'transparent' as the
! Outlook plugin, but they're still quite easy to use/setup,
! and they use the same core, so the results will be the same
! </p>
! <h3>
! <a name="nonoutlook">Forget Outlook, what clients will
! Spambayes work with in general?</a>
! </h3>
! <p>
! Spambayes will work with most POP3 or IMAP compatible
! clients. How you implement depends on your local architecture
! </p>
! <ul>
! <li>
! users with access to procmail can just write a recipe that
! invokes spambayes like this:
! <pre>
! :0fw
! | /opt/spambayes/hammiefilter.py<br>
! </pre>
! followed by a recipe to check the results and take action:
! <pre>
! :0
! * ^X-Spambayes-Classification: spam<br>
! ${MAILDIR}/spam
! </pre>
! </li>
! <li>
! Users limited to POP3/IMAP communications to the server can
! use the <a href=
! "http://spambayes.sourceforge.net/applications.html#pop3">POP3</a>
! or <a href=
! "http://spambayes.sourceforge.net/applications.html#imap">IMAP
! proxy</a> with the <a href=
! "https://sourceforge.net/project/showfiles.php?group_id=61702">
! Spambayes source code.</a>
! </li>
! </ul>
More information about the Spambayes-checkins
mailing list