Parsing email attachments: get_payload() produces unsaveable data
denis.papathanasiou at gmail.com
Wed Oct 14 19:59:25 CEST 2009
On Oct 4, 10:27 am, dpapathanasiou <denis.papathanas... at gmail.com>
> I'm using python to access an email account via POP, then for each
> incoming message, save any attachments.
> This is the function which scans the message for attachments:
> def save_attachments (local_folder, msg_text):
> """Scan the email message text and save the attachments (if any)
> in the local_folder"""
> if msg_text:
> for part in email.message_from_string(msg_text).walk():
> if part.is_multipart() or part.get_content_maintype() ==
> filename = part.get_filename(None)
> if filename:
> filedata = part.get_payload(decode=True)
> if filedata:
> write_file(local_folder, filename, filedata)
> All the way up to write_file(), it's working correctly.
> The filename variable matches the name of the attached file, and the
> filedata variable contains binary data corresponding to the file's
> When I try to write the filedata to a file system folder, though, I
> get an AttributeError in the stack trace.
> Here is my write_file() function:
> def write_file (folder, filename, f, chunk_size=4096):
> """Write the the file data f to the folder and filename
> result = False
> if confirm_folder(folder):
> file_obj = open(os.path.join(folder, file_base_name
> (filename)), 'wb', chunk_size)
> for file_chunk in read_buffer(f, chunk_size):
> result = True
> except (IOError):
> print "file_utils.write_file: could not write '%s' to
> '%s'" % (file_base_name(filename), folder)
> return result
> I also tried applying this regex:
> filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
> after reading this post (http://stackoverflow.com/questions/787739/
> python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
> hasn't resolved the problem.
> Is there any way of correcting the output of get_payload() so I can
> save it to a file?
An update for the record (and in case anyone else also has this
The regex suggested in the StackOverflow post (i.e., filedata = re.sub
(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r\n) is necessary
but not sufficient.
It turns out that because get_payload() returns a binary stream, the
right way to save those bytes to a file is to use a function like
def write_binary_file (folder, filename, filedata):
"""Write the binary file data to the folder and filename
result = False
file_obj = open(os.path.join(folder, file_base_name
result = True
print "file_utils.write_file: could not write '%s' to
'%s'" % (file_base_name(filename), folder)
I.e., filedata, the output of get_payload(), can be written all at
once, w/o reading and writing in 4k chunks.
More information about the Python-list