[Spambayes] Result of a test
Greg Ward
gward@python.net
Thu, 3 Oct 2002 17:13:51 -0400
On 03 October 2002, papaDoc said:
> Looking at the prob of each word I saw something
>
>
> prob('battery"') = 0.844828
> prob('battery,') = 0.844828
>
> prob('powernews,') = 0.77651
> prob('powernews.') = 0.77651
>
> prob('outlet,') = 0.844828
> prob('outlet.') = 0.844828
>
> prob('luncheon') = 0.844828
> prob('luncheon:') = 0.844828
> prob('luncheons') = 0.844828
>
>
> I think it can be interesting to try to remove the ponctuation (the . ,
> ? !) at the end of a word
> and then count it as the same word and do the same thing with the
> plurial (luncheon and luncheons) based
> on a dictionary like the one in ispell.
Tim played with this very early in the project. Turned out that keeping
punctuation, preserving case, and not stemming, were all wins. A bit
counter-intuitive, but there you go. Experiment beats intuition every
time in this project.
Greg
--
Greg Ward <gward@python.net> http://www.gerg.ca/
All right, you degenerates! I want this place evacuated in 20 seconds!