[Spambayes] Re: [Spambayes-checkins] spambayes FileCorpus.py,1.8,1.9Corpus.py,1.5,1.6

Richie Hindle richie@entrian.com
Mon Dec 2 12:22:51 2002


[Richie]
> so the on-demand-ness should come for free for all Corpus-using code.

[Mark]
> How much Corpus-using code is there?  Are there any plans to move any
> existing code that does not use it towards using it?  I've raised this with
> Tim S for Outlook, and it doesn't appear we will - I have no idea about the
> other apps though.

Only pop3proxy.py uses Corpus to my knowledge - hammiebulk.py imports it,
but doesn't seem to use it (?)

I'd like to see more of the existing code using it, but then again I'm not
in a hurry to implement the idea myself...  In an ideal (meaning
"engineering purity") world, we'd have abstract Corpus and Message
interfaces, and all the applications would code to those interfaces
regardless of the concrete classes implementing them.  Then any application
would work with messages stored in any format - hammie could classify your
Outlook messages from the command line, the Outlook plug-in could train on
messages in mbox files, and so on.  In the real world, that kind of thing
usually turns out either to be YAGNI or so hard as to be unreasonable.

Where we end up will probably be somewhere in between.  I was able to
scratch an itch using Corpus - it was exactly what I needed for the web
training interface (partly because Tim and I discussed the design of Corpus
with that in mind).  If other people find they can scratch itches with it,
its usage will grow, otherwise it won't.  Migrating already-working code to
use a new library for reasons of engineering purity isn't an itch that many
people suffer from.

I have a *much* bigger problem with Corpus, which is that I find the word
'Corpus' impossible to type.  Is it just me?

> In the back of my mind, I am pondering if we need a better directory
> structure - maybe with the core engine in a package, and some of these
> "wrappers" used only by a few application also into their own?

Isn't this also YAGNI?  We have a few tens of Python files in the project -
do we really need to split it up?  And if we do, should we be doing it with
the code this young?

-- 
Richie Hindle
richie@entrian.com




More information about the Spambayes mailing list