Lisp to Python translation criticism?
morton at dennisinter.com
Tue Aug 20 09:25:57 CEST 2002
Bayesian Spam filtering blah blah blah.
All thats needed are some very simple filters.
If the incoming email is sent by someone I have sent email to, its
If I am not the only recipient, or the list of recipients is unknown,
If the body of an incoming email contains <my-email-address> or the
words "click here", its spam.
Thats it. I get about 40 spam a day. Perhaps once a week, the filter
fails and one gets through. Ive had one false positive in the last 6
"John E. Barham" <jbarham at jbarham.com> wrote in message news:<Vdh79.156660$Ag2.8265210 at news2.calgary.shaw.ca>...
> Don't know how many saw the story on Slashdot about Paul Graham's article
> (http://www.paulgraham.com/spam.html) on how he filters spam. He posted two
> snippets of code in Lisp, a language which I only have a very passing
> knowledge of. Here's my attempt at translating it into Python:
> (let ((g (* 2 (or (gethash word good) 0)))
> (b (or (gethash word bad) 0)))
> (unless (< (+ g b) 5)
> (max .01
> (min .99 (float (/ (min 1 (/ b nbad))
> (+ (min 1 (/ g ngood))
> (min 1 (/ b nbad)))))))))
> (let ((prod (apply #'* probs)))
> (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x))
> def spam_word_prob(word, good, bad, ngood, nbad):
> g = 2 * good.get(word, 0)
> b = bad.get(word, 0)
> if g + b >= 5:
> return max(0.01, min(0.99, float(min(1, b / nbad) / ((min(1, g /
> ngood) + min(1, b / nbad))))))
> return 0.0
> def spam_prob(probs):
> prod = 1.0
> for prob in probs:
> prod = prod * prob
> inv_probs = [1 - x for x in probs]
> inv_prob = 1.0
> for prob in inv_probs:
> inv_prob = inv_prob * prob
> return prod / (prob + inv_prob)
> Any comments on the correctness, style, efficiency etc. of my translation?
> I'd like to write a Python spam filtering system using Graham's techniques.
> Please note that this is not meant to revive the perpetual debate over the
> relative merits of Python's lambda... ;)
More information about the Python-list