[Email-SIG] Handling large emails: DiskMessage and DiskFeedParser

Barry Warsaw barry at python.org
Mon Oct 4 02:33:11 CEST 2004


On Mon, 2004-05-24 at 21:15, Menno Smits wrote:

Yes, this was months ago. ;)

> FeedParser is great because it doesn't load the entire message into 
> memory during parsing (yes, I realise there are other reasons for 
> FeedParser exising too). However, once the message is parsed the 
> attachment bodies are still loaded entirely in to memory when Message 
> instances are created and populated. This is a big problem for real 
> world enviroments where large messages are possible. All available 
> memory is consumed and the machine grinds to a halt. We see large 
> (40MB+) emails all this time and problems start to occur when several of 
> these are being processed simultaneously.
> 
> To cope with this problem I've created 2 classes DiskMessage and 
> DiskFeedParser (see http://oss.netboxblue.com).

I've prototyped a different approach, see if you like it.  If you do,
there's still time to get it into Python 2.4.

We define a new protocol whereby if the message object returned by the
factor has the following three methods, we use those when capturing the
payload of non-MIME messages.  If not, then we capture those lines in an
internal list object just like normal, calling set_payload() at the
end.  The methods are:

def storage_open(self)
def storage_write(self, data)
def storage_close(self)

So you could use something like this (from the unit test):

class ExternalStorageMessage(Message):
    def storage_open(self):
        fd, self._path = tempfile.mkstemp()
        self._fp = os.fdopen(fd, 'w')

    def storage_write(self, data):
        self._fp.write(data)

    def storage_close(self):
        self._fp.close()
        fp = open(self._path)
        payload = fp.read()
        self.set_payload(payload)

Season to taste.

Thoughts?
-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/email-sig/attachments/20041003/f3256cbd/attachment.pgp


More information about the Email-SIG mailing list