[Email-SIG] Parsing email with large attachment
Vijay Rao
vijay at accellion.com
Tue Sep 4 06:27:04 CEST 2007
Hi ,
I want to use the email package to parse emails with attachments upto
1GB. However I find that python crashes with a Memory error traceback
while parsing the email with even a 300MB attachment at this point :
self._cur.set_payload(EMPTYSTRING.join(lines)) --> feedparser.py
I have the email contents in a file and the code is like ( on
python2.5, winxp ) :
self.msg = email.message_from_file(self.stream)
...
...
#Check if any attachments at all
if self.msg.get_content_maintype() != 'multipart':
print 'No attachments in message'
return
for part in self.msg.walk():
# multipart/* are just containers
if part.get_content_maintype() == 'multipart':
continue
is_attachment = part.get('Content-Disposition')
if is_attachment is None :
#body = part.get_payload(decode=True)
#print 'Body' , body
continue
filename = part.get_filename()
counter = 1
print 'Filename' , filename
if not filename:
filename = 'part-%03d%s' % (counter, 'bin')
counter += 1
att_path = os.path.join(detach_dir, filename)
#Check if its already there
if not os.path.isfile(att_path) :
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
My machine has 2GB RAM so memory is not a problem and it seems python
tries to allocate a large memory chunk while doing a list
concatenation operation.
Also it seems that peak memory used for parsing and extracting the
attachment is three times the attachment size :
1) 2x used for parsing
2) 1x used for extracting it
The only way to fix this seems to be rewriting the parser to not load
the attachment into memory at all and maybe write it to a file , pass
the file pointer to set_payload and decode the attachment in small
chunks in get_payload instead of loading the entire file.
Subclass message to accept a file pointer in set_payload, etc...
Is there any other way to fix it , maybe compile python with some
flags to allow list concatenation to access a larger amount of memory.
Thanks,
Vijay
More information about the Email-SIG
mailing list