[Spambayes] Re: [Spambayes-checkins] spambayes FileCorpus.py,1.8,1.9Corpus.py,1.5,1.6

Tim Stone - Four Stones Expressions tim@fourstonesExpressions.com
Mon Dec 2 15:09:34 2002


12/2/2002 6:22:51 AM, Richie Hindle <richie@entrian.com> wrote:

>
>[Richie]
>> so the on-demand-ness should come for free for all Corpus-using code.
>
>[Mark]
>> How much Corpus-using code is there?  Are there any plans to move any
>> existing code that does not use it towards using it?  I've raised this with
>> Tim S for Outlook, and it doesn't appear we will - I have no idea about the
>> other apps though.
>
>Only pop3proxy.py uses Corpus to my knowledge - hammiebulk.py imports it,
>but doesn't seem to use it (?)
>
>I'd like to see more of the existing code using it, but then again I'm not
>in a hurry to implement the idea myself...  In an ideal (meaning
>"engineering purity") world, we'd have abstract Corpus and Message
>interfaces, and all the applications would code to those interfaces
>regardless of the concrete classes implementing them.  Then any application
>would work with messages stored in any format - hammie could classify your
>Outlook messages from the command line, the Outlook plug-in could train on
>messages in mbox files, and so on.  In the real world, that kind of thing
>usually turns out either to be YAGNI or so hard as to be unreasonable.
>
>Where we end up will probably be somewhere in between.  I was able to
>scratch an itch using Corpus - it was exactly what I needed for the web
>training interface (partly because Tim and I discussed the design of Corpus
>with that in mind).  If other people find they can scratch itches with it,
>its usage will grow, otherwise it won't.  Migrating already-working code to
>use a new library for reasons of engineering purity isn't an itch that many
>people suffer from.

Well, I'm a bit of an engineering purist, and I think that there's benefit to 
having a single abstract interface for message storage.  Right now, we have 
mbox stuff, corpus stuff, outlook stuff.  Mark has indicated that he's not 
interested in the abstraction for the outlook stuff, and that's fine.  But I 
think the mbox/msg stuff should disappear.  They don't do anything that corpus 
doesn't do at the moment, and it's gonna get confusing down the road for 
someone who becomes interested in our code.  Not to mention that our code 
reflects on us... Let's take the plunge and make the Corpus stuff the 
'standard', and where it doesn't support a current requirement, let's fix it.

- TimS
>
>I have a *much* bigger problem with Corpus, which is that I find the word
>'Corpus' impossible to type.  Is it just me?
>
>> In the back of my mind, I am pondering if we need a better directory
>> structure - maybe with the core engine in a package, and some of these
>> "wrappers" used only by a few application also into their own?
>
>Isn't this also YAGNI?  We have a few tens of Python files in the project -
>do we really need to split it up?  And if we do, should we be doing it with
>the code this young?
>
>-- 
>Richie Hindle
>richie@entrian.com
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
www.fourstonesExpressions.com 





More information about the Spambayes mailing list