[Spambayes] Spambayes 1.0a5 comments

Meyer, Tony T.A.Meyer at massey.ac.nz
Fri Sep 5 16:40:15 EDT 2003

> First, thanks! :-)

>From us all, you're welcome :)

> I tested the new word query wild card option, works great. 
> Except that it would be better to have a tabular view, it would take
> much less space. Also we could have an option for the maximum number
> of hits on the query page (drop down of 10, 100, 1000 etc).

I agree with all of this.  It was just the quickest way to do it, and I
didn't have much time for spambayes before the 1.0a5 deadline.  I think
what I'll probably do (if no-one objects) is create an 'advanced find'
query next to the existing one, and change the existing one back to how
it was.  The advanced one can have an option for the number of maximum
hits, plus lots of other options (doing a regex search instead of a
plain wildcard, for example).  The results from this can be put into a
separately designed page.  All of this is a couple of weeks away, though
:)  (If you like, you can reopen that feature request so that I
remember, but I probably will).

> The Tokenize button that I wished for didn't make it (it 
> would be right to Classify, and output all tokens in a message).

Sorry, didn't get a chance to do it; the main difficulty was that I'm
not sure how to present all of this without having a huge page.  This
should be in the 1.1a1 release in some form, though.

> I wonder if there is a command line tool that shows all tokens for
> a message?

You can probably use hammiefilter to do this somehow.  Otherwise, try
something like this (untested):

>>> from spambayes import tokenizer
>>> f = file("path\to\my\message", "r")
>>> msg = f.read()
>>> f.close()
>>> g = tokenizer.Tokenizer().tokenize(msg)
>>> for t in g:
>>>     print t

from in Python (just "python" from a command prompt should get you into

=Tony Meyer

More information about the Spambayes mailing list