[Spambayes] Result of a test

Greg Ward gward@python.net
Thu, 3 Oct 2002 17:13:51 -0400


On 03 October 2002, papaDoc said:
> Looking at the prob of each word I saw something
> 
> 
> prob('battery"') = 0.844828
> prob('battery,') = 0.844828
> 
> prob('powernews,') = 0.77651
> prob('powernews.') = 0.77651
> 
> prob('outlet,') = 0.844828
> prob('outlet.') = 0.844828
> 
> prob('luncheon') = 0.844828
> prob('luncheon:') = 0.844828
> prob('luncheons') = 0.844828
> 
> 
> I think it can be interesting to try to remove the ponctuation (the . , 
> ? !) at the end of a word
> and then count it as the same word and do the same thing with the 
> plurial (luncheon and luncheons) based
> on a dictionary like the one in ispell.

Tim played with this very early in the project.  Turned out that keeping
punctuation, preserving case, and not stemming, were all wins.  A bit
counter-intuitive, but there you go.  Experiment beats intuition every
time in this project.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
All right, you degenerates!  I want this place evacuated in 20 seconds!