[Spambayes] Spambayes database related question...

peter at sibilski.net peter at sibilski.net
Wed Aug 22 02:31:24 CEST 2007


Thanks for the information! 

Can you point me in the right direction were I may read about performing
the steps to obtain this kind of data? I have saved nearly 7234 pieces
of Spam; is there any less painful way of obtaining this data from the
past two years... Is there a less painful way of obtaining this
information than individually selecting each one of my 7234 spam emails?
	You'll then get IP address-related tokens when they are
significant.  For example, here are some IP bits from a few random
messages I have in my mailbox right now:

	    'url-ip:194.109.207.14/32': 0.35;

Also I read about 'sb_dbexpimp.py', thinking about trying to export my
SpamBayes database to MySQL via CSV test file. Unfortunately, with
Python 2.5 installed I continue to have nothing but syntax errors every
time I try following instructions. [see cmd.exe output below] Any help
would be awesome!
	C:\MESSAGES\spambayes-1.0.4\scripts>sb_dbexpimp.py -e -d
..\..\Spam\default_message_database.db -f
..\..\Spam\default_message_database.export
	  File "C:\MESSAGES\spambayes-1.0.4\scripts\sb_dbexpimp.py",
line 93
	    from __future__ import generators
	SyntaxError: from __future__ imports must occur at the beginning
of the file

	C:\MESSAGES\spambayes-1.0.4\scripts>sb_dbexpimp.py -e -d
..\..\Spam\default_bayes_database.db -f
..\..\Spam\default_bayes_database.export
	  File "C:\MESSAGES\spambayes-1.0.4\scripts\sb_dbexpimp.py",
line 93
	    from __future__ import generators
	SyntaxError: from __future__ imports must occur at the beginning
of the file
 
Sincerely,
 
Peter Sibilski
(M) +1 414 467 4046
Peter at Sibilski.net

-----Original Message-----
From: skip at pobox.com [mailto:skip at pobox.com] 
Sent: Sunday, August 19, 2007 7:30 PM
To: peter at sibilski.net
Cc: spambayes at python.org
Subject: Re: [Spambayes] Spambayes database related question...


    peter> Have you given any thought to running a central database on a
    peter> local network SpamBayes 'Server'?

There are both MySQL- and PostgreSQL-based classifiers in the SpamBayes
repository.  They are both almost entirely untested.  There is also a
ZEO-based classifier (ZEO == centralized ZODB I believe) in the 1.1
alpha series.

    peter> I wish SpamBayes would also keep a list/db of offending
network
    peter> address path information within the email internet
    peter> headers... would be extremely useful to generate statistical
    peter> reports for frequently encountered offending IPs etc.

You can get partway there by adding this to your SpamBayes INI file:

    [Headers]
    include_evidence:True

    [Tokenizer]
    mine_received_headers:True
    x-pick_apart_urls:True

You'll then get IP address-related tokens when they are significant.
For example, here are some IP bits from a few random messages I have in
my mailbox right now:

    'url-ip:194.109.207.14/32': 0.35;
    'url-ip:194.109.207/24': 0.35;
    'url-ip:194.109/16': 0.35;
    'url-ip:88.198/16': 0.09;
    'url-ip:88/8': 0.09 ;
    'received:192.168.1': 0.21;
    'received:10.3.1': 0.16;
    'received:10.3.1.93': 0.16;
    'received:209.191': 0.16;
    'received:66.35.250.225': 0.16;

You can then analyze them at your leisure.

Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20070821/5791d80c/attachment.htm 


More information about the SpamBayes mailing list