Lisp to Python translation criticism?

Tue Aug 20 03:25:57 EDT 2002

Bayesian Spam filtering blah blah blah.

All thats needed are some very simple filters.

If the incoming email is sent by someone I have sent email to, its
legit.

otherwise:

If I am not the only recipient, or the list of recipients is unknown,
its spam.

If the body of an incoming email contains <my-email-address> or the
words "click here", its spam.

Thats it. I get about 40 spam a day. Perhaps once a week, the filter
fails and one gets through. Ive had one false positive in the last 6
months.

"John E. Barham" <jbarham at jbarham.com> wrote in message news:<Vdh79.156660$Ag2.8265210 at news2.calgary.shaw.ca>...
> Don't know how many saw the story on Slashdot about Paul Graham's article
> (http://www.paulgraham.com/spam.html) on how he filters spam.  He posted two
> snippets of code in Lisp, a language which I only have a very passing
> knowledge of.  Here's my attempt at translating it into Python:
> 
> Lisp:
> 
> (let ((g (* 2 (or (gethash word good) 0)))
>       (b (or (gethash word bad) 0)))
>    (unless (< (+ g b) 5)
>      (max .01
>           (min .99 (float (/ (min 1 (/ b nbad))
>                              (+ (min 1 (/ g ngood))
>                                 (min 1 (/ b nbad)))))))))
> 
> (let ((prod (apply #'* probs)))
>   (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x))
>                                      probs)))))
> 
> Python:
> 
> def spam_word_prob(word, good, bad, ngood, nbad):
>     g = 2 * good.get(word, 0)
>     b = bad.get(word, 0)
>     if g + b >= 5:
>         return max(0.01, min(0.99, float(min(1, b / nbad) / ((min(1, g /
> ngood) + min(1, b / nbad))))))
>     else:
>         return 0.0
> 
> def spam_prob(probs):
>     prod = 1.0
>     for prob in probs:
>         prod = prod * prob
>     inv_probs = [1 - x for x in probs]
>     inv_prob = 1.0
>     for prob in inv_probs:
>         inv_prob = inv_prob * prob
>     return prod / (prob + inv_prob)
> 
> Any comments on the correctness, style, efficiency etc. of my translation?
> I'd like to write a Python spam filtering system using Graham's techniques.
> 
> Please note that this is not meant to revive the perpetual debate over the
> relative merits of Python's lambda...  ;)
> 
>     John