[Spambayes] Spambayes database related question...
peter at sibilski.net
peter at sibilski.net
Wed Aug 22 02:31:24 CEST 2007
Thanks for the information!
Can you point me in the right direction were I may read about performing
the steps to obtain this kind of data? I have saved nearly 7234 pieces
of Spam; is there any less painful way of obtaining this data from the
past two years... Is there a less painful way of obtaining this
information than individually selecting each one of my 7234 spam emails?
You'll then get IP address-related tokens when they are
significant. For example, here are some IP bits from a few random
messages I have in my mailbox right now:
'url-ip:194.109.207.14/32': 0.35;
Also I read about 'sb_dbexpimp.py', thinking about trying to export my
SpamBayes database to MySQL via CSV test file. Unfortunately, with
Python 2.5 installed I continue to have nothing but syntax errors every
time I try following instructions. [see cmd.exe output below] Any help
would be awesome!
C:\MESSAGES\spambayes-1.0.4\scripts>sb_dbexpimp.py -e -d
..\..\Spam\default_message_database.db -f
..\..\Spam\default_message_database.export
File "C:\MESSAGES\spambayes-1.0.4\scripts\sb_dbexpimp.py",
line 93
from __future__ import generators
SyntaxError: from __future__ imports must occur at the beginning
of the file
C:\MESSAGES\spambayes-1.0.4\scripts>sb_dbexpimp.py -e -d
..\..\Spam\default_bayes_database.db -f
..\..\Spam\default_bayes_database.export
File "C:\MESSAGES\spambayes-1.0.4\scripts\sb_dbexpimp.py",
line 93
from __future__ import generators
SyntaxError: from __future__ imports must occur at the beginning
of the file
Sincerely,
Peter Sibilski
(M) +1 414 467 4046
Peter at Sibilski.net
-----Original Message-----
From: skip at pobox.com [mailto:skip at pobox.com]
Sent: Sunday, August 19, 2007 7:30 PM
To: peter at sibilski.net
Cc: spambayes at python.org
Subject: Re: [Spambayes] Spambayes database related question...
peter> Have you given any thought to running a central database on a
peter> local network SpamBayes 'Server'?
There are both MySQL- and PostgreSQL-based classifiers in the SpamBayes
repository. They are both almost entirely untested. There is also a
ZEO-based classifier (ZEO == centralized ZODB I believe) in the 1.1
alpha series.
peter> I wish SpamBayes would also keep a list/db of offending
network
peter> address path information within the email internet
peter> headers... would be extremely useful to generate statistical
peter> reports for frequently encountered offending IPs etc.
You can get partway there by adding this to your SpamBayes INI file:
[Headers]
include_evidence:True
[Tokenizer]
mine_received_headers:True
x-pick_apart_urls:True
You'll then get IP address-related tokens when they are significant.
For example, here are some IP bits from a few random messages I have in
my mailbox right now:
'url-ip:194.109.207.14/32': 0.35;
'url-ip:194.109.207/24': 0.35;
'url-ip:194.109/16': 0.35;
'url-ip:88.198/16': 0.09;
'url-ip:88/8': 0.09 ;
'received:192.168.1': 0.21;
'received:10.3.1': 0.16;
'received:10.3.1.93': 0.16;
'received:209.191': 0.16;
'received:66.35.250.225': 0.16;
You can then analyze them at your leisure.
Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20070821/5791d80c/attachment.htm
More information about the SpamBayes
mailing list