[Spambayes] idea for tokenizer.crack_filename change
Skip Montanaro
skip@pobox.com
Sat, 21 Sep 2002 09:28:00 -0500
>> It seems to me that base64-encoded, all DOS/Windows executables start
>> with (reciting from memory, since I've deleted all viruses and
>> haven't received any new ones in the last 15 minutes or so) "TPqAAA"
>> or something similar. Why rely on finding specific file extensions?
>> They can just change.
Tim> Well, not often, and the scheme we're working on is supposed to be able to
Tim> learn when they do <wink>. Would you like to write some code to tokenize
Tim> this particular bit of Windows Lore?
I gave it a try, but I'm still suffering with fp/fn rates around 15%, so
anything I see is suspect. Also, I saw no change. It's quite possible I
have a bug, but I've also cleaned out obvious viruses from my corpora. True
spam may have enough indicators elsewhere that this scheme won't help.
Should I just go ahead and checkin my change (it is controlled by a couple
new options, and by default is not enabled) and let y'all point out my bugs?
Skip