[issue6942] email.generator.Generator memory consumption
Ross Patterson
report at bugs.python.org
Sat Sep 19 01:53:52 CEST 2009
New submission from Ross Patterson <me at rpatterson.net>:
Due to repeated use of StringIO as a way to "look ahead" into subparts
while checking that multipart boundaries are unique, memory consumption
during email.generator.Generator.flatten() can be up to 3 times the
original message size.
I implemented a subclass of email.generator.Generator that works around
this using email.message.Message.walk() to check message headers and
string (final) payloads for the boundary without duplicating their
contents into a StringIO.
It assumes that the boundary only ever might be duplicated in a single
part's headers or in a single part's payload when that part's payload is
a string. IOW, it assumes that the boundary will not be duplicated by
some combination of all the parts' and recursive subparts' headers and
string payloads.
If this assumption is safe, then this implementation should work. If
this assumption is not safe, then perhaps a different boundary format
can be used which will make this assumption safe?
You can find my implementation at http://gitorious.org/rpatterson-
imappipe/rpatterson-
imappipe/blobs/master/rpatterson/imappipe/generator.py
----------
components: Library (Lib)
messages: 92853
nosy: rpatterson
severity: normal
status: open
title: email.generator.Generator memory consumption
type: resource usage
versions: Python 2.6
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6942>
_______________________________________
More information about the Python-bugs-list
mailing list