[Email-SIG] Handling large emails: DiskMessage and DiskFeedParser
barry at python.org
Mon Oct 4 02:33:11 CEST 2004
On Mon, 2004-05-24 at 21:15, Menno Smits wrote:
Yes, this was months ago. ;)
> FeedParser is great because it doesn't load the entire message into
> memory during parsing (yes, I realise there are other reasons for
> FeedParser exising too). However, once the message is parsed the
> attachment bodies are still loaded entirely in to memory when Message
> instances are created and populated. This is a big problem for real
> world enviroments where large messages are possible. All available
> memory is consumed and the machine grinds to a halt. We see large
> (40MB+) emails all this time and problems start to occur when several of
> these are being processed simultaneously.
> To cope with this problem I've created 2 classes DiskMessage and
> DiskFeedParser (see http://oss.netboxblue.com).
I've prototyped a different approach, see if you like it. If you do,
there's still time to get it into Python 2.4.
We define a new protocol whereby if the message object returned by the
factor has the following three methods, we use those when capturing the
payload of non-MIME messages. If not, then we capture those lines in an
internal list object just like normal, calling set_payload() at the
end. The methods are:
def storage_write(self, data)
So you could use something like this (from the unit test):
fd, self._path = tempfile.mkstemp()
self._fp = os.fdopen(fd, 'w')
def storage_write(self, data):
fp = open(self._path)
payload = fp.read()
Season to taste.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 307 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/email-sig/attachments/20041003/f3256cbd/attachment.pgp
More information about the Email-SIG