[Spambayes] It gets funnier all the time....
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Wed Feb 12 11:22:47 EST 2003
2/12/2003 11:21:29 AM, Neil Schemenauer <nas at python.ca> wrote:
>Tim Stone - Four Stones Expressions wrote:
>> I doubt that the tokenizer would generate any meaningful tokens from this
>> message. Generating a token would be the right way to do it, any ideas
> import string
> import re
> BASE64_CHARSET = string.ascii_letters + string.digits + "+/"
> valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match
> def tokenize_word(...):
> elif 60 <= n <= 76 and valid_base64(word):
> yield 'bare base64'
That's the idea. Do we have any of these kind of messages in any of our test
corpora? If not, we need to find some... -TimS
>I don't know if 60 is reasonable as a lower bound. Does someone want to
>test Outlook? Maybe it only magically detects base-64 if the line is
>exactly 76 characters long.
>Spambayes mailing list
>Spambayes at python.org
c'est moi - TimS
More information about the Spambayes