[Spambayes] was no subject (where can find documentation)
mpas1342 at yahoo.de
Mon Jun 12 10:05:27 CEST 2006
Von: Tony Meyer [mailto:tameyer at ihug.co.nz]
Gesendet: Montag, 12. Juni 2006 09:39
Cc: Tim Peters; spambayes at python.org
Betreff: Re: [Spambayes] was no subject (where can find documentation)
> Ok, i take a look on it later.
If you took a look at it now, you might not need to ask this <0.5 wink>.
> But there is q Question regarding withespaces
> and token's building.
> Let consiider this sample:
> I get an email with only this paraghraph on the body:
> Sun is shining.
> if you say because of wiithspaces there are only:
> to be checked,
In short: yes. In reality, we skip any tokens less than three
characters in length, and there are also many tokens from the headers.
> i will ask what is with the substrings in sun and shining
> and all combinations for shinig like
> Because the spam email could contain at this paragraph spam words
> like this:
> sunBuy is shinigViagra
> i hope the sample is understandable:-)
Look for mention of "character n-grams" in the comments in
tokenizer.py for discussion about this. In short, 'words' work
better and have the added bonus of resulting in (mostly) human-
Your example (assuming there are no header tokens) would either be
spam (another spam using these embedded words has already been
trained), or unsure (they have never been seen before). Your example
is also extremely unclear - it does a very poor job at selling, which
is the whole point, after all. So a spammer gains little, and has
lost a lot.
1-and if the sample is like this:
sunBuy is shinigViagrawww.xyx.com/dfdf.html
2-how manytokens will be there?
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de
More information about the SpamBayes