[Spambayes] Supporting new database type in classifier

Brad Clements bkc at murkworks.com
Mon Feb 16 09:52:10 EST 2004


On 14 Feb 2004 at 23:25, Tim Peters wrote:

> You'll probably get better responses on the spambayes-dev list.

Ah, I must have missed the announcement of that list when it was created.

> I encourage you to work on a branch for now -- since most people drop most
> ideas after a few weeks at most, I'm opposed to warping this part of the
> code to cater to something as unlikely to be seen again as a
> non-random-access database model.  If you work on a branch and demonstrate
> astonishing results, great, then we'll junk all other storages and adopt
> yours <wink>.

Well ok, except I wasn't asking about the mechanics of putting my code into the tree, 
but rather, what's the best way to refactor Classifier so this would be easier to do.



> > I could override _getclues, but then I'd have to recreate the
> > bigram stuff which is quite a lot.
> 
> It's less than 30 lines of code (half of it is comments).

But then that code would be duplicated. So at some point (assuming I don't fade away), 
we'll only want one copy of the bigram synthesis code. That's the basis of my question, 
what's the best way to re-arrange the existing code?

> > Second, what's the best way to restructure classifier so that a
> > storage subclass can deal with entire wordstreams in one lump if
> > it so chooses?
> 
> On a branch -- prove this is worth doing first, and don't worry about doing
> it cleanly before that succeeds.
> 

heh heh. You're not answering my question.. ;-)

I'll be back in touch with my dirty proof of concept.


-- 
Brad Clements,                bkc at murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
http://www.wecanstopspam.org/                   AOL-IM: BKClements




More information about the Spambayes mailing list