[Python-Dev] python hangs when parsing a bad-formed email

Tue Apr 22 09:43:02 CEST 2008

Hi all,
First of all, sorry if this isn't the list where I have to post this.
And sorry for my english.

As the subject says, I'm having problems with the attached email, when
I try to get a email object  reading the attached file, the python
process gets hang and gets all cpu.

I have debuged my code to find where it happens, and I found that is
_parsegen method of the FeedParser class. I know that the email format
is wrong but  I don't know why python hangs.

following paste the code showing where hangs.

def _parsegen(self):
        # Create a new message and start by parsing headers.
        self._new_message()
        headers = []
        # Collect the headers, searching for a line that doesn't match the RFC
        # 2822 header or continuation pattern (including an empty line).
        for line in self._input:
            if line is NeedMoreData:
                yield NeedMoreData
                continue
            if not headerRE.match(line):
                # If we saw the RFC defined header/body separator
                # (i.e. newline), just throw it away. Otherwise the line is
                # part of the body so push it back.
                if not NLCRE.match(line):
                    self._input.unreadline(line)
                break
            headers.append(line)
        # Done with the headers, so parse them and figure out what we're
        # supposed to see in the body of the message.
        self._parse_headers(headers)
        # Headers-only parsing is a backwards compatibility hack, which was
        # necessary in the older parser, which could throw errors.  All
        # remaining lines in the input are thrown into the message body.
        if self._headersonly:
            lines = []
            while True:
                line = self._input.readline()
                if line is NeedMoreData:
                    yield NeedMoreData
                    continue
                if line == '':
                    break
                lines.append(line)
            self._cur.set_payload(EMPTYSTRING.join(lines))
            return
        if self._cur.get_content_type() == 'message/delivery-status':
!!!!!!  AT THIS POINT HANGS, AND STRAT TO GET ALL CPU FOR THE PROCESS
            # message/delivery-status contains blocks of headers separated by
            # a blank line.  We'll represent each header block as a separate
            # nested message object, but the processing is a bit different
            # than standard message/* types because there is no body for the
            # nested messages.  A blank line separates the subparts.
  ...
  ...
  ...

I have workaround the problem adding this line in _parse_headers method

def _parse_headers(self, lines):
        # Passed a list of lines that make up the headers for the current msg
        lastheader = ''
        lastvalue = []
        for lineno, line in enumerate(lines):
            # Check for continuation
            if line[0] in ' \t':
                if not lastheader:
                    # The first line of the headers was a continuation.  This
                    # is illegal, so let's note the defect, store the illegal
                    # line, and ignore it for purposes of headers.
                    defect = errors.FirstHeaderLineIsContinuationDefect(line)
                    self._cur.defects.append(defect)
                    continue
                if line.strip()!='': !!!!!!! IF THE CONTINUATION LINE
IS NOT EMPTY ADD THE LINE TO THE HEADER.
                    lastvalue.append(line)
                continue
            if lastheader:
    ...
    ...
    ...

I don't know why it hangs and I'm not sure why with this line works......

I have tried to parse this email in python 2.3.3 SunOs, python 2.3.3 gcc
python 2.5.1 SunOs,gcc, Windows Xp, and linux SUSE 10. And I have
alway the same  result.

bash-3.00$ python
Python 2.5.1 (r251:54863, Feb 28 2008, 07:48:25)
[GCC 3.4.6] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> fp = open('raro.txt')
>>> mail = email.message_from_file(fp)
never return............

I don't know if someone can tell me what is happening....

Best Regards.

Alberto Casado.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: raro.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080422/46c9e000/attachment-0001.txt>