[Richie Hindle, cogitates about Messages and their Corpus(ora)] That's the ticket! Backing off to a more fundamental level looks useful to me too. We never even straightened that much out for testing purposes (msgs.py isn't general enough; for some custom test drivers (never checked in), I couldn't even reuse the MsgStream class for my *own* directory structures). I disagree with Mark's
If the process of re-training a message in the Outlook GUI becomes:
def RetrainMessageAsSpam(): # Outlook specific code to get an ID. message = message_store.GetMessage(id) if not classifier.IsSpam(message): classifier.train(message, is_spam=True)
And not a whole lot else, it doesn't seem worth it.
because it illustrates the point <wink>: it doesn't look like a correct re-training method (although it may be, depending on assumptions about where "id" comes from, and what assorted classifier methods do), and while a correct method shouldn't be hard, in the absence of a class dedicated to doing the simple common things that *can* be done in a common way, everyone will keep screwing it up in their own client code.
... You might want to run it past Tim Peters, 'cos he's *far* better at this kind of thing than I am (though he's also busy).
I have to do more Python and Zope work now, so have to guard my time on *this* project more jealously than I have. MarkH and SeanT and JeremyH all have ideas here too, and I trust you'll sort them out as a harmonious family bent on world domination. As a general strategy, the first person to check code in usually wins <wink -- but take a clue from Mark, and from the earlier days of this project, and from the pop3 proxy, and sling code more than talk about it -- refactoring in Python is easy when the need becomes apparent from real life>.
... The mark of a good framework is when you write a tiny little class (like AutoTrainer above for instance) that contains hardly any code but adds a major new feature (in this case, automatic training when moving messages around in Outlook).
The client-specific code to hook and track msg movement in Outlook is relatively massive, so everything else appears a drop in the bucket to Mark. Nevertheless, if a usable framework for capturing the *common* part of this stuff were available, removing the 5 lines of code quoted above would help (the Outlook client, and all others).