Tim Stone - Four Stones Expressions wrote:
This is why you keep a corpus. This is pre-alpha code, and anything that anyone does at any time can screw the world up. You should simply delete your database and retrain it. If you don't have a corpus, go ahead and make one now... <wink>
Alright, this triggered a feature request in me, which resulted in some hacking activity <wink>. The patch below appends training messages to one of two mbox files ('_pop3proxyspam.mbox' or '_pop3proxyham.mbox' respectively), making it easier to later rebuild the database from scratch, while still being able to train ad hoc with the web interface of pop3proxy.py. Good idea? Just Index: pop3proxy.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v retrieving revision 1.10 diff -c -r1.10 pop3proxy.py *** pop3proxy.py 5 Nov 2002 22:18:56 -0000 1.10 --- pop3proxy.py 6 Nov 2002 21:37:03 -0000 *************** *** 608,615 **** raise SystemExit def onUpload(self, params): ! message = params.get('file') or params.get('text') isSpam = (params['which'] == 'spam') self.bayes.learn(tokenizer.tokenize(message), isSpam, True) self.push("""<p>Trained on your message. Saving database...</p>""") self.push(" ") # Flush... must find out how to do this properly... --- 608,626 ---- raise SystemExit def onUpload(self, params): ! message = params.get('file') or params.get('text') isSpam = (params['which'] == 'spam') + # Append the message to a file, to make it easier to rebuild + # the database later. + message = message.replace('\r\n', '\n').replace('\r', '\n') + if isSpam: + f = open("_pop3proxyspam.mbox", "a") + else: + f = open("_pop3proxyham.mbox", "a") + f.write("From ???@???\n") # fake From line (XXX good enough?) + f.write(message) + f.write("\n") + f.close() self.bayes.learn(tokenizer.tokenize(message), isSpam, True) self.push("""<p>Trained on your message. Saving database...</p>""") self.push(" ") # Flush... must find out how to do this properly...