[Spambayes] Result of a test

papaDoc papaDoc@videotron.ca
Thu, 03 Oct 2002 09:48:46 -0400


This is a multi-part message in MIME format.

---------------------- multipart/mixed attachment
Hi,

The attachment is the result of a run on my ham and spam.

They are comming from 3 different email addresses.
The email can be in english or french.

Most of the fp are email from company (palm and APC) that I subscribed
to their mailing list. (Even if I don't see those email I won't miss them
because usually I don't read them).

Others are subscription verification and some spam (what I consider spam)
are email forwarded to me by my boss.

I am using all the default values.


Most of the false negative are spam in french !
Since my ration of french/english is really low and
the ration of french spam/french ham is very low


I did not play with the python code yet since I'm new to python

Looking at the prob of each word I saw something


prob('battery"') = 0.844828
prob('battery,') = 0.844828

prob('powernews,') = 0.77651
prob('powernews.') = 0.77651

prob('outlet,') = 0.844828
prob('outlet.') = 0.844828

prob('luncheon') = 0.844828
prob('luncheon:') = 0.844828
prob('luncheons') = 0.844828


I think it can be interesting to try to remove the ponctuation (the . , 
? !) at the end of a word
and then count it as the same word and do the same thing with the 
plurial (luncheon and luncheons) based
on a dictionary like the one in ispell.

papaDoc

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: run1.zip
Type: application/x-zip-compressed
Size: 41935 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021003/f71c7995/run1.bin

---------------------- multipart/mixed attachment--