[Spambayes] Use of email package
lists at webcrunchers.com
Sun Mar 16 18:43:47 EST 2003
>>>>>> "TS" == Tim Stone <tim at fourstonesexpressions.com> writes:
> TS> We've got to either seriously harden our code so it knows what
> TS> to do when the email package raises an exception, or consider
> TS> not using the email package. I think I'll be reworking
> TS> pop3proxy so that it no longer uses the email package for
> TS> anything. The Corpus stuff currently has most (all?) the
> TS> function that is needed by pop3proxy anyway.
>Let me take this opportunity to elaborate on the architecture of the
>email package. There was a deliberate separation between the
>representation of email messages and the parsing of flat text to that
>object model (and in generating flat text from the object model, but
>that may not be relevant).
>Thus, it was designed with an eye toward the use of application
>specific parsers, and it may well be that the default parsers (both
>the strict and the lax parsers) may not be appropriate for an
>application that tends to see intentionally ill-formed messages. My
>suggestion would be to write a parser that can handle the really bad
>messages, then use the default lax parser for most things, and fall
>back to the "adaptive parser" for the really horrendous messages.
>Then donate that parser back to Python. <wink>
I've already spent a lot of time developing my system using the "email" package and the classic "Message" classes.
I'm also aware of the bugs in the email.Parser, especially when it comes to parsing MINE type messages, in particular the KlezH virii I keep getting, which in most cases GAGS my mail processing system.
Right now, I skip processing these messages, and leave them on the POP server, and manually deal with them.
I'm hoping we can still use these packages, because we already spent a lot of time using them, but lets just try and fix the Parser to work right.
I'm still using a very much earlier version of the SpamBayes project, and I know I need to catch up, but was planning to hold off in doing that until I can get another OpenBSD box on our Co_location rack, which we plan to earmark for Specific SpamBayes development.
On top of that, I'm also working on our SMS (Spam Management System) under Open Source, where we plan to "Collect" spam into a SQL database, with the idea of developing a spam processing system. This involves building the Database, then as spam comes in, to PROCESS it so we can keep track of REPEAT spam, and be able to do really cool things to allow sending the spam to SpamCop, FTC, etc. It's also going to test the opt out mechanisms of the spam and further classify it in order to identify the really bad ones.
Each database entry allows one to take specific notes on the spam, to allow for easy tracing of the spammer and locating them through "whois" lookups on the sites they hock in the spam.
I've already got some pretty solid code to extract URL's and opt out addresses, and other routines to test the validity of the opt out URL'S. So the idea is to be able to instantly look up specific spams I report to the Authorities to verify if gatways are still open, or bring up notes on pending investigations against spammers, and also bring up the Whois contact info on the domains...
I'm doing this manually right now, but eventually want to get this automation working soon. I already got about 10 spammer's domains shut down because their Whois is bogus, so it would automatically link to the Domain name issuers complaint forms pages, keeping track of the "ticket numbers" allowing me to easily follow up my complaints to unsure they revoke the spammers domain name, or put in accurate Whois info so that their contact info is accurate.
I have all of this almost working on my LOCAL box on my LAN, and hopefully within a few weeks, want to being up "spamcruncher.com" server box with a web site, PostGres, Python, and the SMS libraries and CGI's that drive the web based GUI, and setup a few Alpha and Beta testers. On it, would be a pop3proxy, SMTP Proxy, Database, Spambayes, etc. Would then be looking for anyone wanting to participate in our SMS development.
Any comments? Forward them to "crunch at shopip.com" as I use this mail address specifically for my Mailing lists, and I download all my list mail every week.
More information about the Spambayes