[Spambayes] RE: Spambayes Digest, Vol 61, Issue 36

John A john at comnet-tech.com
Wed Sep 10 18:19:58 EDT 2003


Please unsubscribe me from this mailing list.  I have been trying to get off
of it for several weeks now.

-----Original Message-----
From: spambayes-bounces at python.org
[mailto:spambayes-bounces at python.org]On Behalf Of
spambayes-request at python.org
Sent: Wednesday, September 10, 2003 4:29 PM
To: spambayes at python.org
Subject: Spambayes Digest, Vol 61, Issue 36


Send Spambayes mailing list submissions to
	spambayes at python.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://mail.python.org/mailman/listinfo/spambayes
or, via email, send a message with subject or body 'help' to
	spambayes-request at python.org

You can reach the person managing the list at
	spambayes-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Spambayes digest..."


Today's Topics:

   1. Re: error  (Anthony Baxter)
   2. Re: Watch out for this  (Anthony Baxter)
   3. Re: Unfilterable email using Outlook plugin  (Anthony Baxter)
   4. Re: How to setup Spambayes for Domino server
      (JRedmond at ymcastlouis.org)
   5. RE: Watch out for this  (Ryan Malayter)
   6. Error in classifier.py? (Martin Davis)
   7. RE: Watch out for this  (Skip Montanaro)
   8. Re: Error in classifier.py? (Skip Montanaro)
   9. Rule Wizard (TheAceMan)
  10. Whitelist for SpamBayes for Outlook (Tom Boland)
  11. Re: How to setup Spambayes for Domino server (Skip Montanaro)
  12. Re: Error in classifier.py? (Skip Montanaro)


----------------------------------------------------------------------

Message: 1
Date: Thu, 11 Sep 2003 02:22:29 +1000
From: Anthony Baxter <anthony at interlink.com.au>
Subject: Re: [Spambayes] error
To: "Rueve, Christina" <ruevec at fivestarcu.com>
Cc: spambayes at python.org
Message-ID: <200309101622.h8AGMTmc003320 at localhost.localdomain>
Content-Type: text/plain; charset="us-ascii"

>>> "Rueve, Christina" wrote
> I am getting the following error during installation:  Unable to
> register the DLL/OCX: DllRegister Server Failed; code 0x00000000.  Click
> Retry to try again, Ignore to proceed anyway (not recommended), or Abort
> to cancel installation.  I click on Retry and nothing happened.  I then
> clicked on igore.  The version is binary version 0.6.  PC is XP with
> Office XP.  Please advise.

Please try again with the 0.81 version on the website.

Anthony


------------------------------

Message: 2
Date: Thu, 11 Sep 2003 02:25:32 +1000
From: Anthony Baxter <anthony at interlink.com.au>
Subject: Re: [Spambayes] Watch out for this
To: skip at pobox.com
Cc: spambayes at python.org
Message-ID: <200309101625.h8AGPWUA003410 at localhost.localdomain>
Content-Type: text/plain; charset="us-ascii"

>>> Skip Montanaro wrote
> Good suggestion.  I'm not sure if the tokenizer does this already, but a
> quick grep for '&#[0-9];' through my current training database (about 3
> million lines) suggests this is still fairly infrequently used.  I only
> found about 2100 lines (around 0.07%) of the lines contained a numeric
> entity.  If/when the spammers start using such techniques and they turn
out
> to cause problems for the classifier, it should be fairly easy to extend
the
> tokenizer to make the necessary substitutions.

Maybe we should have a file somewhere of "yet to be tested" tokeniser
ideas? And update it with a comment when we find what does or doesn't
work? (Ref the discussion yesterday about tokenising tricks tried and
abandoned...)



------------------------------

Message: 3
Date: Thu, 11 Sep 2003 02:26:51 +1000
From: Anthony Baxter <anthony at interlink.com.au>
Subject: Re: [Spambayes] Unfilterable email using Outlook plugin
To: "Greg Jewell" <gjewell at cnnxn.com>
Cc: spambayes at python.org
Message-ID: <200309101626.h8AGQpZi003448 at localhost.localdomain>
Content-Type: text/plain; charset="us-ascii"

>>> "Greg Jewell" wrote
> I encountered a problem this morning, though, and I'm not certain what
> the issue is. I received a spam email that is disguised as if it's a
> message returne d as undeliverable. Here's the subject of the mail:
> "Undeliverable: Read: gjeve get Free Printer Cartridges and more".
> For some reason, Spambayes says that this message is unfilterable. It
> won't give it a spam probability value, and when I click "Delete as
> Spam" an error message pops up saying that No filterable mail items
> have been selected.
>
> Has anybody else encountered this before?

Hm. Is it possible that the message's MIME structure is totally messed
up, and the tokeniser's bailing out? It's far too long since I looked
at that bit of the code... Tony?

Anthony


------------------------------

Message: 4
Date: Wed, 10 Sep 2003 11:32:03 -0500
From: JRedmond at ymcastlouis.org
Subject: Re: [Spambayes] How to setup Spambayes for Domino server
To: spambayes at python.org
Message-ID:
	<OF57C70AE7.097AFBBE-ON86256D9D.00535BFF-86256D9D.005B05B0 at ymcastlouis.org>

Content-Type: text/plain; charset=us-ascii


Check out http://www.openntf.org, the Domino/Notes open-source community.
There's talk of adding Bayesian filtering to their OpenNTFMail template and
pushing the workload off onto the client side, and Spambayes will probably
wind up getting integrated somehow.  (They're in the process of relocating
their servers today, though, so you'll have to check it out tomorrow.)

In the meantime, I'd suggest you resist the urge to integrate the Domino
server's SMTP and Router processes and Bayesian filtering - speaking from
experience, that approach can only bring you pain.  Instead, use Spambayes
as a proxy.

************************************
James Redmond, Domino Administrator
YMCA of Greater St. Louis
jredmond at ymcastlouis.org




------------------------------

Message: 5
Date: Wed, 10 Sep 2003 12:18:08 -0500
From: "Ryan Malayter" <rmalayter at bai.org>
Subject: RE: [Spambayes] Watch out for this
To: "Anthony Baxter" <anthony at interlink.com.au>,	<skip at pobox.com>
Cc: spambayes at python.org
Message-ID: <792DE28E91F6EA42B4663AE761C41C2AF3D3FB at cliff.bai.org>
Content-Type: text/plain;	charset="us-ascii"

From: Anthony Baxter [mailto:anthony at interlink.com.au]
> Maybe we should have a file somewhere of
> "yet to be tested" tokeniser ideas? And
> update it with a comment when we find what
> does or doesn't work?

This sounds like a good idea to me, and would prevent a lot of
unnecessary duplication of effort.

Perhaps in the RFE section of the SpamBayes CVS? We could preface the
subject of all tokenizing entries with a [tokenizer] tag or something.


------------------------------

Message: 6
Date: Wed, 10 Sep 2003 10:41:12 -0700 (PDT)
From: Martin Davis <m0davis at pacbell.net>
Subject: [Spambayes] Error in classifier.py?
To: spambayes at python.org
Message-ID: <20030910174112.12074.qmail at web80401.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii

(Forgive me if this message has been sent more than
once.)

I finally thought I had Spambayes' pop3proxy working
when Istarted getting the following error (see below)
whenever I try toreceieve an email message.  I am
running Mozilla 1.5b on Windows XPPro.  Note that is
was working correctly for a short while.  I don'tknow
what happened to cause the error.  I do notice that
Spambayes WebInterface reports  "Totalemails trained:
Spam: 5 Ham: 92".  However, I know that recently there
were as many as 10spam messages and 100 ham messages.

Please help.  Thanks.

Error dump follows:
------------------------------------------------------------------------

SpamBayes POP3 Proxy Beta2, version 0.2 (September
2003),
using SpamBayes POP3 Proxy Web Interface Alpha3,
version 0.03
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading database... SMTP Listener on port 25 is
proxyingsmtp.pacbell.yahoo.com:
25
Listener on port 110 is proxying
pop.pacbell.yahoo.com:110
User interface url is http://localhost:8880/
Traceback (most recent call last):
  File "C:\Program
Files\Python23\Scripts\pop3proxy.py", line 437,
inonRetr
    evidence=True)

File"C:\PROGRA~1\Python23\Lib\site-packages\spambayes\classifier.py",
line22
3, in chi2_spamprob
    clues = self._getclues(wordstream)

File"C:\PROGRA~1\Python23\Lib\site-packages\spambayes\classifier.py",
line46
0, in _getclues
    prob = self.probability(record)

File"C:\PROGRA~1\Python23\Lib\site-packages\spambayes\classifier.py",
line31
0, in probability
    assert spamcount <= nspam
AssertionError






------------------------------

Message: 7
Date: Wed, 10 Sep 2003 13:40:30 -0500
From: Skip Montanaro <skip at pobox.com>
Subject: RE: [Spambayes] Watch out for this
To: "Ryan Malayter" <rmalayter at bai.org>
Cc: spambayes at python.org
Message-ID: <16223.28702.409585.775186 at montanaro.dyndns.org>
Content-Type: text/plain; charset=us-ascii


    Ryan> From: Anthony Baxter [mailto:anthony at interlink.com.au]
    >> Maybe we should have a file somewhere of "yet to be tested" tokeniser
    >> ideas? And update it with a comment when we find what does or doesn't
    >> work?

+1

    Ryan> This sounds like a good idea to me, and would prevent a lot of
    Ryan> unnecessary duplication of effort.

Or at least a lot of unnecessary duplication of suggestions. ;-)

Any entries in that file which are shot down should probably be explained.
In fact, perhaps a quick summary of all the stuff the tokenizer does would
be useful.  Why make the spammers read our source code? ;-)

Skip


------------------------------

Message: 8
Date: Wed, 10 Sep 2003 13:51:33 -0500
From: Skip Montanaro <skip at pobox.com>
Subject: Re: [Spambayes] Error in classifier.py?
To: Martin Davis <m0davis at pacbell.net>
Cc: spambayes at python.org
Message-ID: <16223.29365.436886.368204 at montanaro.dyndns.org>
Content-Type: text/plain; charset=us-ascii


    Martin> I finally thought I had Spambayes' pop3proxy working when
    Martin> Istarted getting the following error ...

    Martin>
File"C:\PROGRA~1\Python23\Lib\site-packages\spambayes\classifier.py",
line310, in probability
    Martin>     assert spamcount <= nspam
    Martin> AssertionError

This is a sure sign of a corrupted database file.  If you execute this
Python code:

    import whichdb
    whichdb.whichdb(r"c:\path\to\your\database\file")

what does it report?  I suggest you rename it (in case someone wants to look
at it) and restart your training.  Since you're using pop3proxy I suggest
you use the pickle format.  To do this, add these lines to your ini file:

    [Storage]
    persistent_use_database: False

Make sure you move your corrupt database out of the way first.

Skip


------------------------------

Message: 9
Date: Wed, 10 Sep 2003 15:00:15 -0500
From: "TheAceMan" <TheAceMan at bargainace.com>
Subject: [Spambayes] Rule Wizard
To: <spambayes at python.org>
Message-ID: <000001c377d6$2865e920$6401a8c0 at theaceman>
Content-Type: text/plain;	charset="us-ascii"

I am currently running Winsows XP Pro and MS Outlook 2002.  I recently
added SpamBayes to outlook.  But it seems one of my rules is overriding
spambaye.  When I get a new message it is moved to the spam folder then
moved to another folder because of rules I had previously.  However I
can not open the rule wizard any more either.  I'm unsure why it won't
open.  Outlook runs normally and everything seems to work yet I can not
open the rule wizard, I can create new rules but right clicking on a
message but can't see old rules by selecting tools | rule wizard.  I can
try reinstalling Office or Outlook or XP but would like some advice
before I do that.

Thanks in advance,
  Kevin

------------------------------

Message: 10
Date: Wed, 10 Sep 2003 13:16:36 -0700
From: Tom Boland <tom.boland at cpsinc.com>
Subject: [Spambayes] Whitelist for SpamBayes for Outlook
To: "'spambayes at python.org'" <spambayes at python.org>
Message-ID: <516DF10AD075D611853F00B0D03DF0E69C672D at CPSLA1>
Content-Type: text/plain;	charset="iso-8859-1"

I know you say it's not needed *but it is*!!

This is the only thing stopping me from implementing this product (and
making donations) over anything else. I need to make sure that email from
certain people (my boss, his boss, all local domain mail ,etc.) gets through
100% of the time.

Even after several months of training, some mail from these people gets
dumped into Quarantine: Junk. I even saved 200 messages from one person,
trained it as good mail, and I still get mail from that person (my CEO, no
less) sent to the Quarantine.

I know, I know that this goes against your belief in the One True God of
Bayesian filtering, but in the real world, we REALLY need at least a white
list and preferably a blacklist, too.

Thanks for an otherwise fantastic program. Keep up the good work.

Tom Boland
IT Manager
Commercial Programming Systems


------------------------------

Message: 11
Date: Wed, 10 Sep 2003 15:22:29 -0500
From: Skip Montanaro <skip at pobox.com>
Subject: Re: [Spambayes] How to setup Spambayes for Domino server
To: "jmwilson at knology.net" <jmwilson at knology.net>
Cc: spambayes at python.org
Message-ID: <16223.34821.348988.769104 at montanaro.dyndns.org>
Content-Type: text/plain; charset=us-ascii


(make sure you reply to the entire list so you get the benefit of everyone's
eyeballs and brains...)

    James> I am running the software on Redhat 8. The domino server is
    James> running SMTP.  How do I set it up to scan the emails that come
    James> through. I have already got it installed. Could you please
    James> help?

    Skip> You didn't say anything about the mail reader you use.  SpamBayes
    Skip> works best at the client where each user can train on their own
    Skip> mail.  That said, I'm not sure anyone has investigated integrating
    Skip> it with a Domino server, though others have done so for other SMTP
    Skip> servers.

    James> We use Lotus Notes client to read mail. What I was tring to do is
    James> set it up to be its own server. Is that possible for all email be
    James> filtered through this server then go to the Domino server.

There is an smtpproxy script, but I'm not sure if this is the sort of
situation it was designed for.  Another SB user wrote an SMTP proxy which
performs the necessary checks at SMTP transfer time, rejecting the message
if it scores as spam, then calling the real SMTP server if it passes that
test.  That sounds like something you could try.  Search the list archives
for the past few weeks.

Skip


------------------------------

Message: 12
Date: Wed, 10 Sep 2003 15:28:10 -0500
From: Skip Montanaro <skip at pobox.com>
Subject: Re: [Spambayes] Error in classifier.py?
To: Martin Davis <m0davis at pacbell.net>
Cc: spambayes at python.org
Message-ID: <16223.35162.304062.296851 at montanaro.dyndns.org>
Content-Type: text/plain; charset=us-ascii

    >>> whichdb.whichdb(r"c:\program files\python23\scripts\hammie.db")
    'dbhash'

That's what I figured.  Note that some bugs related to storage have been
fixed in the past few weeks.  You might want to try 1.0a5 if you're not up
to that rev yet.

    Martin> I made the change you suggested and renamed the database.
    Martin> Working fine so far.  Thanks.

    Martin> P.S. I didn't post this reply to the list.  If that's wrong, let
    Martin> me know.

Not wrong, but when you're trying to solve a problem it helps to get more
than one perspective instead of just that of the first wacko to reply. ;-)
In particular, others appear to be much better at remembering that "the
database transmogrification bug was fixed in version 3.14159 of
somereallybigmodule.py".

Skip


------------------------------

_______________________________________________
Spambayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

End of Spambayes Digest, Vol 61, Issue 36
*****************************************




More information about the Spambayes mailing list