[Spambayes] Database Format
tameyer at ihug.co.nz
Mon Feb 9 15:56:00 EST 2004
> Now to my question.. I found in the FAQ where I can
> locate the classification database. Is there a way
> I can extract data from this DB?
> Is there a way I can convert my SpamBayes database
> to extract out the words considered spam?
In the source distribution (you'll need Python installed as well) there is a
script called sb_dbexpimp.py. It'll convert the database to a flat-text
'`'-separated file, which you can use.
Note that this doesn't include any probabilities, or scores, just counts -
i.e. how many times each token has been seen in ham/spam. So if you want
probabilities, you'll have to do some calculation yourself.
Alternatively, there's another script in the source distribution called
spamcounts.py, which can output certain sections (including the whole thing,
IIRC) of the database, including scores as they currently stand. You could
capture the output of this to a file, and it might do more of what you're
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes