Matt> The Trac[1] project has resurrected work on a SpamBayes plugin for Matt> filtering Wiki and ticket edits after finding the current Akismet Matt> system to be unreliable. Tony Meyer added some comments[2] to the Matt> Wiki suggesting that we write a custom tokenizer instead of using Matt> the built-in email-centric tokenizer. Why not just create an "email message" out of the input? If the headers are identical in every message they won't generate any useful tokens and the message body will be all that yields useful clues. OTOH, if you have login or IP address information for the spammers, you might suitably populate the From: field. Matt> Are there examples from other people that have written custom tokenizers Matt> that may be helpful, or do you have any hints on what to take into Matt> account for writing an effective tokenizer for Wiki text? So far, I think most of us have bent our input to look like email. I think that would be a lot easier than writing and debugging a new tokenizer. Skip