spamBayes is great, thank you all!
What a great program! Many many thanks ...! SpamBayes works just fine, removes 80%-90% SPAM automatically and learns with each manually deleted spam mail. Wow. Great. Only 4 non-spam mails (out of hundreds) were wrongly treated as spam so far - and they actually looked like Spam, I have to admit. The SPAM catastrophy started in ~1996, when mainstream capitalism discovered the web, culminated in about 800 spam mails last week, and ended when I found SpamBayes, actually by reading a web-article which compared SpamBayes with some other software. The one drawback is that all mails are still fetched from the POP3-box which is really annoying with these 116kB MS-Update.exe-Mails. But still, that only costs connection-money, not my precious time... The Outlook plugin is good, too. Seamless integration is a must. If you have German speaking friends, send them my signature below. It explains WHAT and HOW TO - but also shows one urgent need, because many Dummies might not be able to do all 4 steps ;-) Publish an installer which includes all 4 programs: Python, Python-Win32, SpamBayes and SpamBayes-Outlook-Setup ! Would that be a good idea? Obviously you have to ask Python.org - but they'd gain a lot by being installed on many more computers... ciao, Andreas ---- Hey, super: Ich hab das Unglaubliche geschafft, nämlich SPAM loszuwerden! Endlich, es war zuletzt die absolute Hölle geworden - und nun ist sie scheinbar überstanden... Es gibt ein kostenloses Python-Script "SpamBayes", das sich nahtlos in Outlook, etc. einbaut und mittels einer trainierbaren Bayes-Routine alle eMails, die nach SPAM aussehen, automatisch in zwei Ordner wegsortiert - "wahrscheinlich" (15%) und "fast sicher"(90%). Du musst allerdings einiges downloaden, da Dein Rechner vermutlich die moderne Programmiersprache Python noch nicht spricht, deshalb erstmal Python ("current production version"): http://www.python.org/download/ und die Python-Win32-Routinen: http://starship.python.net/crew/mhammond/win32/Downloads.html danach dann von Sourceforge (die wichtigste OpenSource-Community) https://sourceforge.net/project/showfiles.php?group_id=61702 das Hauptprogramm spambayes-1.XXX.zip und das Outlook-Addin SpamBayes-Outlook-Setup-0XXX.exe für andere POP-Clients als Outlook gäbe es wohl auch Unterstützung: http://spambayes.sourceforge.net/windows.html Tja und dann einfach alles in obiger Reihenfolge installieren. Am besten Du hast schon zwei vorsortierte Ordner (zB 400 gewünschte und 1300 SPAM-Mails), um das Ersttraining vorzunehmen - wenn Du bisher immer alle SPAMs sofort gelöscht hast, ist das auch nicht so schlimm, dann drückst Du einfach bei jeder ankommenden SPAM mail den neuen gelben Button in Outlook "Delete as SPAM" - und nach und nach wird SpamBayes immer mehr von alleine erkennen. Toll, nech?
[AndreasK]
What a great program! Many many thanks ...!
SpamBayes works just fine, removes 80%-90% SPAM automatically and learns with each manually deleted spam mail. Wow. Great. Only 4 non-spam mails (out of hundreds) were wrongly treated as spam so far - and they actually looked like Spam, I have to admit.
spambayes "should be" doing better than that. It's best if you train it on an approximately equal number of ham and spam. If you've done so, and have trained on at least several hundred of each, then I'd expect better performance than you report here. If your primary language isn't English, that could explain it, as *most* developers and testers here use English. If, for example, your primary email language is German, then the Outlook addin's [Tokenizer] replace_nonascii_chars: True setting may be inappropriate for you, and the default skip_max_word_size value of 12 may be too small (13-character words like Unterstützung are hurt by both of those: first the ü gets replaced by a question mark due to replace_nonascii_chars, and then the whole word gets replaced by a synthesized "skip: U 10" token because 13 > 12). We've done almost nothing here on tuning for languages other than English, so I expect the default settings work best with English. Still, I appreciate that it's better than nothing even with the poor performance you reported <wink>.
Hi!
What a great program! Many many thanks ...! That was my main reason for writing. Just say thankyou ;-) Especially as it is a FREE programm. How do you all make money?
because many Dummies might not be able to do all 4 steps ;-) Publish an installer which includes all 4 programs: Python, Python-Win32, SpamBayes and SpamBayes-Outlook-Setup ! In the not-too-distant future, there will be an installer for Windows that installs either the Outlook plug-in, or a binary version of sb_server (covering those using POP3), or both if you really need both. Neither Python nor the win32 extensions are needed in that case (or, more accurately, the parts that are needed are included). Good to know! Inform me so that I can inform my Dummy-friends.
SpamBayes works just fine, removes 80%-90% SPAM automatically spambayes "should be" doing better than that. Actually, it does. Far better. 80-90% was a fast-shot-from-the-hip.
I have retrained (1674 good and 1900 spam) - and today it sucked away 80 spam mails and missed only 2. The problem is obviously NEW FORMS of spam. And I don't blame SpamBayes for it. And then it's a quick click-and-away, that's ok for me.
If your primary language isn't English, that could explain it, as *most* developers and testers here use English. If, for example, your primary email language is German, then the Outlook addin's [Tokenizer] replace_nonascii_chars: True setting may be inappropriate for you OK, I have changed it to: replace_nonascii_chars: False
correct? Do I have to RETRAIN? Should I discard my "old" spambayes database before?
and the default skip_max_word_size value of 12 may be too small (13-character words like Unterstützung are hurt by both of those: first the ü gets replaced by a question mark due to replace_nonascii_chars, and then the whole word gets replaced by a synthesized "skip: U 10" token because 13 > 12). Do you really think I should change that, too? Any disadvantages? WHERE, in which file (there are 2 Options.py and 2 tokenizer.py files containing skip_max_word_size) Why is there no .INI file for the main spambayes program?
Still, I appreciate that it's better than nothing even with the poor performance you reported <wink>. YES.
Thanks again, Andreas
Andreas> OK, I have changed it to: Andreas> replace_nonascii_chars: False Andreas> correct? Yup. Andreas> Do I have to RETRAIN? It would be helpful. Andreas> Should I discard my "old" spambayes database before? Yes. >> and the default skip_max_word_size value of 12 may be too small >> (13-character words like Unterstützung are hurt by both of those: >> first the ü gets replaced by a question mark due to >> replace_nonascii_chars, and then the whole word gets replaced by a >> synthesized "skip: U 10" token because 13 > 12). Andreas> Do you really think I should change that, too? Any Andreas> disadvantages? It's definitely worth a try. You should probably keep it below 30. If you set it longer than that, you will probably start to treat lots of binary gibberish as words. In any case, you might want to test to see how it does with various values (15, 20, 25, 30). The longer this is, the larger your database will be as well. Andreas> WHERE, in which file (there are 2 Options.py and 2 tokenizer.py Andreas> files containing skip_max_word_size) Why is there no .INI file Andreas> for the main spambayes program? There is, but you have to create it. Call it anything you want then set the BAYESCUSTOMIZE environment variable to point to it. There should be no need to modify the Options.py source file. Skip
Hi! Two thoughts that might make the thingy easier for beginners:
Andreas> OK, I have changed it to: Andreas> replace_nonascii_chars: False Andreas> correct? Yup.
1) There are 2 places with a "default_bayes_customize.ini" Delete one - or rename it. Otherwise ...
Andreas> WHERE, in which file (there are 2 Options.py and 2 tokenizer.py Andreas> files containing skip_max_word_size) Why is there no .INI file Andreas> for the main spambayes program? There is, but you have to create it. Call it anything you want then set the BAYESCUSTOMIZE environment variable to point to it. There should be no need to modify the Options.py source file.
2) Too complicated. Not for me, but for other people. Just include a *.INI file in the next distribution. my2(euro)cents, Andreas
[Tim]
If your primary language isn't English, that could explain it, as *most* developers and testers here use English. If, for example, your primary email language is German, then the Outlook addin's
[Tokenizer] replace_nonascii_chars: True
setting may be inappropriate for you
[AndreasK]
OK, I have changed it to: replace_nonascii_chars: False
correct?
Yup!
Do I have to RETRAIN?
For best results, yes. The option affects the tokens that get stored into your database. For example, if Rückhalt appeared in your trained email when the option was True, r?ckhalt got stored in your database, but after you change the option to False, rückhalt will get looked up in new email. That won't match the r?ckhalt stored before.
Should I discard my "old" spambayes database before?
For best results, yes, and for the same reason.
and the default skip_max_word_size value of 12 may be too small (13-character words like Unterstützung are hurt by both of those: first the ü gets replaced by a question mark due to replace_nonascii_chars, and then the whole word gets replaced by a synthesized "skip: U 10" token because 13 > 12).
Do you really think I should change that, too?
If and only if you want to experiment. We didn't experiment with German here -- nobody has, as far as I know.
Any disadvantages?
Your database will grow larger, and it *may* work worse for you. It's impossible to guess without trying it. You don't *have* to try it! We're happy to have you here even if you "just use" the program <wink>.
WHERE, in which file (there are 2 Options.py and 2 tokenizer.py files containing skip_max_word_size) Why is there no .INI file for the main spambayes program?
If you're using the Outlook client, there's a file named default_bayes_customize.ini That's the same .ini file you changed when you set replace_nonascii_chars to False.
>> SpamBayes works just fine, removes 80%-90% SPAM automatically and >> learns with each manually deleted spam mail. Wow. Great. Only 4 >> non-spam mails (out of hundreds) were wrongly treated as spam so far >> - and they actually looked like Spam, I have to admit. Tim> spambayes "should be" doing better than that. [lots of good stuff deleted] I used that as the basis for a new q&a in the FAQ about using SpamBayes with non-English languages. What does "Ünterstützung"mean? Skip
On Wed, 1 Oct 2003 09:09:32 -0500 Skip Montanaro <skip@pobox.com> wrote:
>> SpamBayes works just fine, removes 80%-90% SPAM automatically and >> learns with each manually deleted spam mail. Wow. Great. Only 4 >> non-spam mails (out of hundreds) were wrongly treated as spam so far >> - and they actually looked like Spam, I have to admit.
Tim> spambayes "should be" doing better than that. [lots of good stuff deleted]
I used that as the basis for a new q&a in the FAQ about using SpamBayes with non-English languages. What does "Ünterstützung"mean?
The first one is not an umlaut, it's a plain "U". German <-> English Unterstützung {f}; Begleitung {f}; Zusatz {m}; Rückhalt {m} <-> backing Unterstützung {f}; Stütze {f}; Rückhalt {m} <-> support Unterstützung {f}; Förderung {f} <-> funding Erleichterung {f}; Unterstützung {f}; Hilfe {f} <-> relief Förderung {f}; Unterstützung {f} <-> promotion Hilfe {f}; Unterstützung {f} <-> assistance ohne Unterstützung <-> unassisted moralische Unterstützung {f} <-> moral support Bye, Alexander. -- Give a man a fish and you feed him for a day; teach him to use the Net and he won't bother you for weeks. http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7
participants (5)
-
Alexander Leidinger -
Andreas K (PGP please) -
AndreasK -
Skip Montanaro -
Tim Peters