[Spambayes] Good evening/morning/afternoon everyone

Josiah Carlson jcarlson@uci.edu
Sat, 28 Sep 2002 16:29:15 -0700


Richie,

> Rather than modifying the Subject line, I add an extra header.  I believe
> all email clients that have any filtering ability will allow you to filter
> on an arbitrary header.  This is also how Hammie, an alternative driver
> for the spambayes code written by Neale Pickett, works (Hammie is designed
> to fit into a Unix mail delivery system).  The header that pop3proxy.py
> adds is called X-Hammie-Disposition, so that it matches Hammie.

The popfile.sourceforge.net project does a similar thing, adding an
X-Text-Classification header (he is also submitting it as an RFC). I
originally had that be the default behavior of PASP, but remembered that
I have (in the past) had email software that doesn't allow arbitrary
header matching. By inserting the Subject, I guarantee that ANY email
software can filter it.

> pop3proxy.py adjusts the message sizes reported by STAT and LIST to
> include the added X-Hammie-Disposition header.  I'm not sure this makes
> any real difference, but there may be an email client out there that
> malloc()s just enough space and then reads the message into it...

I never thought of that.  Looks like I'm going to have to do some
adjustment.  Cursed C.  For those emails that turn out to not be spam,
I am thinking that it wouldn't be a big deal to just put some blank
spaces at the end of the email.  I hope that no one would really notice.

> I like your system of adjusting the USER to allow proxying to multiple
> accounts - pop3proxy.py doesn't do this.

That is actually another cue I got from popfile.  To be truthful, I
started working on PASP because I wanted something that worked like
popfile, but was free.  At the time, popfile was closed source, but has
recently gone open source.

> My email client (Forte Agent) times out after a couple of minutes of
> inactivity.  I guess others do too.  This caused a problem with large
> messages (eg. with attachments) - the proxy would take so long downloading
> the message without sending anything to the client that the client would
> time out.  For this reason, I added a timeout to pop3proxy.py for long
> messages - it downloads as much as it can in 30 seconds, then classifies
> according to the data it's received and starts sending data back to the
> client.

I also hadn't thought about timeouts.  I'm on 1mbit DSL and never
receive emails larger than 200k, and this only occurs twice a week when
I have to update my GF's webcomic (www.blue-comic.com if you are curious). 
The timeout idea is good, especially for anyone with a dialup, slow dsl
connection, or even shared connections with a packetshaper or heavy
traffic.

I am going to look into adding that.

> I'm told that MSN uses a proprietary POP3 authentication protocol - I
> couldn't find much information about it on the web, but I think it's based
> on POP3-AUTH.  I think pop3proxy.py should support it, because it accepts
> any command and assumes that unknown commands are not multiline.  I think
> your code does the same - is that right? - your 'supported' dictionary
> seems to be unused.  Do you know anything about this MSN system, or do you
> have the means to test against it ('cos I don't...)

I have no idea.  I know that Hotmail requires either pay for use pop3
email, or you have to use Outlook [Express].  It wouldn't be too much
work to do some network sniffing and reverse-engineer the protocol, as
I've been looking to add in a hotmail/yahoomail portion in the future.

I don't suppose anyone out there is willing to take YahooPOPS! and do a
code translation into python...

Currently the code doesn't check for supported/unknown commands, but the
below modification adds it...

elif self.lcmd in supported or self.lcmd == 'usent':
    self.client.send(data)
    self.client.send('\r\n')
else:
    self.client.send('-ERR command not supported, goodbye\r\n')
    self.client.close_when_done()
    #and all the other stuff to kill the connection to the server.

> A question: could you explain 'usent'?  I see self.lcmd being set to
> 'usent', but that seems to be the only place it gets used.  Is it trying
> to cope with POP3 servers that print an extra line after the response to
> the USER command?

Ahhh.  It is mostly just a placeholder.  Because I wanted the mail
client to see the pop3 proxy greeting, the pop3 server greeting, and the
pop3 server USER response I had to do a couple things.  I also didn't
want to send extra +OK messages on different lines, in case the client
had problems with such things.

The following is the type of thing you would capture in your client logs.

+OK pop3 proxy ready
USER randomuser:pop3.rs.com
+OK pop3 server at pop3.rs.com +OK password required for randomuser
PASS ******
+OK Contgratulations.
STAT

The second +OK line is what the 'usent' is for.  It doesn't need it
until I add in the lines earlier in this email.

You don't have to deal with this because your pop3proxy KNOWS which
server to connect to before the client connects.  PASP doesn't know
until it gets the USER user:server command.

> Another question: "pop_proxy.callback(self.parent, ...)" seems like an
> unusual way of writing "self.parent.callback(...)" - is this just a
> stylistic difference, or is there a special reason you do it this way?

I started with self.parent.callback(...), but I kept getting errors with
the parent not being able to access self.server and self.client at all. 
pop_proxy.callback(self.parent,...) works, and I haven't bothered to see
if I can change it back and still have it work.  I'm lazy like that.

 - Josiah