[Spambayes] to From_ or not to From_?
Guido van Rossum
guido@python.org
Sat, 28 Sep 2002 14:49:34 -0400
> > and the messages *do* have Unix From lines.
>
> Actually, none of mine do, because BruceG's spam didn't. I removed
> all the "From " lines from the c.l.py archive to match that (easier
> than inventing such lines for Bruce's msgs). I don't know that it
> makes any difference for the way I run the tests, but it certainly
> could make a difference if "From " lines were getting mined for
> clues. I forced all my msgs alike in this respect just to cut off
> that possibility.
Weird. I used splitndirs.py to create my normalized test data setup
and it wrote Unix From lines. In fact, looking at the code, it uses
str(msg), which forces unixfrom=1, which always writes a Unix From
line.
But it's possible that you created your data setup using a different
version of splitndirs.py.
Anyway, the email package always recognizes a Unix From line (it's
hard to mistake for an rfc822 header line) and stores it in a special
attribute of the Message object. Unless you wrote code in your
tokenizer to look at that, I'm pretty sure you're ignoring it. :-)
So Skip can stop worrying: presence or absence of Unix From lines
doesn't matter.
--Guido van Rossum (home page: http://www.python.org/~guido/)