[Spambayes] to From_ or not to From_?

Guido van Rossum guido@python.org
Sat, 28 Sep 2002 14:49:34 -0400


> > and the messages *do* have Unix From lines.
> 
> Actually, none of mine do, because BruceG's spam didn't.  I removed
> all the "From " lines from the c.l.py archive to match that (easier
> than inventing such lines for Bruce's msgs).  I don't know that it
> makes any difference for the way I run the tests, but it certainly
> could make a difference if "From " lines were getting mined for
> clues.  I forced all my msgs alike in this respect just to cut off
> that possibility.

Weird.  I used splitndirs.py to create my normalized test data setup
and it wrote Unix From lines.  In fact, looking at the code, it uses
str(msg), which forces unixfrom=1, which always writes a Unix From
line.

But it's possible that you created your data setup using a different
version of splitndirs.py.

Anyway, the email package always recognizes a Unix From line (it's
hard to mistake for an rfc822 header line) and stores it in a special
attribute of the Message object.  Unless you wrote code in your
tokenizer to look at that, I'm pretty sure you're ignoring it. :-)

So Skip can stop worrying: presence or absence of Unix From lines
doesn't matter.

--Guido van Rossum (home page: http://www.python.org/~guido/)