[Spambayes] idea for tokenizer.crack_filename change
Tim Peters
tim.one@comcast.net
Fri, 20 Sep 2002 00:30:53 -0400
[Skip Montanaro]
> It seems to me that base64-encoded, all DOS/Windows executables start
> with (reciting from memory, since I've deleted all viruses and haven't
> received any new ones in the last 15 minutes or so) "TPqAAA" or
> something similar. Why rely on finding specific file extensions? They
> can just change.
Well, not often, and the scheme we're working on is supposed to be able to
learn when they do <wink>. Would you like to write some code to tokenize
this particular bit of Windows Lore? We currently ignore 100% of the
*content* of MIME sections that don't have text/* type, although the MIME
metadata is tokenized for all MIME sections via
# Content-{Type, Disposition} and their params, and charsets.
for x in msg.walk():
for w in crack_content_xyz(x):
yield w
in tokenize_headers().