Parsing email attachments: get_payload() produces unsaveable data

dpapathanasiou denis.papathanasiou at gmail.com
Wed Oct 14 13:59:25 EDT 2009


On Oct 4, 10:27 am, dpapathanasiou <denis.papathanas... at gmail.com>
wrote:
> I'm using python to access an email account via POP, then for each
> incoming message, save any attachments.
>
> This is the function which scans the message for attachments:
>
> def save_attachments (local_folder, msg_text):
>     """Scan the email message text and save the attachments (if any)
> in the local_folder"""
>     if msg_text:
>         for part in email.message_from_string(msg_text).walk():
>             if part.is_multipart() or part.get_content_maintype() ==
> 'text':
>                 continue
>             filename = part.get_filename(None)
>             if filename:
>                 filedata = part.get_payload(decode=True)
>                 if filedata:
>                     write_file(local_folder, filename, filedata)
>
> All the way up to write_file(), it's working correctly.
>
> The filename variable matches the name of the attached file, and the
> filedata variable contains binary data corresponding to the file's
> contents.
>
> When I try to write the filedata to a file system folder, though, I
> get an AttributeError in the stack trace.
>
> Here is my write_file() function:
>
> def write_file (folder, filename, f, chunk_size=4096):
>     """Write the the file data f to the folder and filename
> combination"""
>     result = False
>     if confirm_folder(folder):
>         try:
>             file_obj = open(os.path.join(folder, file_base_name
> (filename)), 'wb', chunk_size)
>             for file_chunk in read_buffer(f, chunk_size):
>                 file_obj.write(file_chunk)
>             file_obj.close()
>             result = True
>         except (IOError):
>             print "file_utils.write_file: could not write '%s' to
> '%s'" % (file_base_name(filename), folder)
>     return result
>
> I also tried applying this regex:
>
> filedata = re.sub(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r
> \n
>
> after reading this post (http://stackoverflow.com/questions/787739/
> python-email-getpayload-decode-fails-when-hitting-equal-sign), but it
> hasn't resolved the problem.
>
> Is there any way of correcting the output of get_payload() so I can
> save it to a file?

An update for the record (and in case anyone else also has this
problem):

The regex suggested in the StackOverflow post (i.e., filedata = re.sub
(r'\r(?!=\n)', '\r\n', filedata) # Bare \r becomes \r\n) is necessary
but not sufficient.

It turns out that because get_payload() returns a binary stream, the
right way to save those bytes to a file is to use a function like
this:

def write_binary_file (folder, filename, filedata):
    """Write the binary file data to the folder and filename
combination"""
    result = False
    if confirm_folder(folder):
        try:
            file_obj = open(os.path.join(folder, file_base_name
(filename)), 'wb')
            file_obj.write(filedata)
            file_obj.close()
            result = True
        except (IOError):
            print "file_utils.write_file: could not write '%s' to
'%s'" % (file_base_name(filename), folder)
    return result

I.e., filedata, the output of get_payload(), can be written all at
once, w/o reading and writing in 4k chunks.



More information about the Python-list mailing list