[spambayes-dev] effective tokenizer for wiki text
matt at matt-good.net
Mon Oct 30 22:38:05 CET 2006
The Trac project has resurrected work on a SpamBayes plugin for
filtering Wiki and ticket edits after finding the current Akismet system
to be unreliable. Tony Meyer added some comments to the Wiki
suggesting that we write a custom tokenizer instead of using the
built-in email-centric tokenizer.
Are there examples from other people that have written custom tokenizers
that may be helpful, or do you have any hints on what to take into
account for writing an effective tokenizer for Wiki text?
-- Matt Good
More information about the spambayes-dev