[Python-checkins] python/dist/src/Lib/email Parser.py,1.18,1.19

bwarsaw@users.sourceforge.net bwarsaw@users.sourceforge.net
Tue, 05 Nov 2002 13:44:08 -0800


Update of /cvsroot/python/python/dist/src/Lib/email
In directory usw-pr-cvs1:/tmp/cvs-serv7523

Modified Files:
	Parser.py 
Log Message:
parse(), _parseheaders(), _parsebody(): A fix for SF bug #633527,
where in lax parsing, the first non-header line after a header block
(e.g. the first line not containing a colon, and not a continuation),
can be treated as the first body line, even without the RFC mandated
blank line separator.

rfc822 had this behavior, and I vaguely remember problems with this,
but can't remember details.  In any event, all the tests still pass,
so I guess we'll find out. ;/

This patch works by returning the non-header, non-continuation line
from _parseheader() and using that as the first header line prepended
to fp.read() if given.  It's usually None.

We use this approach instead of trying to seek/tell the file-like
object.


Index: Parser.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/email/Parser.py,v
retrieving revision 1.18
retrieving revision 1.19
diff -C2 -d -r1.18 -r1.19
*** Parser.py	5 Nov 2002 20:54:37 -0000	1.18
--- Parser.py	5 Nov 2002 21:44:06 -0000	1.19
***************
*** 60,66 ****
          """
          root = self._class()
!         self._parseheaders(root, fp)
          if not headersonly:
!             self._parsebody(root, fp)
          return root
  
--- 60,66 ----
          """
          root = self._class()
!         firstbodyline = self._parseheaders(root, fp)
          if not headersonly:
!             self._parsebody(root, fp, firstbodyline)
          return root
  
***************
*** 81,84 ****
--- 81,85 ----
          lastvalue = []
          lineno = 0
+         firstbodyline = None
          while True:
              # Don't strip the line before we test for the end condition,
***************
*** 121,131 ****
                  if self._strict:
                      raise Errors.HeaderParseError(
!                         "Not a header, not a continuation: ``%s''"%line)
                  elif lineno == 1 and line.startswith('--'):
                      # allow through duplicate boundary tags.
                      continue
                  else:
!                     raise Errors.HeaderParseError(
!                         "Not a header, not a continuation: ``%s''"%line)
              if lastheader:
                  container[lastheader] = NL.join(lastvalue)
--- 122,135 ----
                  if self._strict:
                      raise Errors.HeaderParseError(
!                         "Not a header, not a continuation: ``%s''" % line)
                  elif lineno == 1 and line.startswith('--'):
                      # allow through duplicate boundary tags.
                      continue
                  else:
!                     # There was no separating blank line as mandated by RFC
!                     # 2822, but we're in non-strict mode.  So just offer up
!                     # this current line as the first body line.
!                     firstbodyline = line
!                     break
              if lastheader:
                  container[lastheader] = NL.join(lastvalue)
***************
*** 135,140 ****
          if lastheader:
              container[lastheader] = NL.join(lastvalue)
  
!     def _parsebody(self, container, fp):
          # Parse the body, but first split the payload on the content-type
          # boundary if present.
--- 139,145 ----
          if lastheader:
              container[lastheader] = NL.join(lastvalue)
+         return firstbodyline
  
!     def _parsebody(self, container, fp, firstbodyline=None):
          # Parse the body, but first split the payload on the content-type
          # boundary if present.
***************
*** 153,156 ****
--- 158,163 ----
              separator = '--' + boundary
              payload = fp.read()
+             if firstbodyline is not None:
+                 payload = firstbodyline + '\n' + payload
              # We use an RE here because boundaries can have trailing
              # whitespace.
***************
*** 261,265 ****
              container.attach(msg)
          else:
!             container.set_payload(fp.read())
  
  
--- 268,275 ----
              container.attach(msg)
          else:
!             text = fp.read()
!             if firstbodyline is not None:
!                 text = firstbodyline + '\n' + text
!             container.set_payload(text)
  
  
***************
*** 275,279 ****
      interested in is the message headers.
      """
!     def _parsebody(self, container, fp):
          # Consume but do not parse, the body
!         container.set_payload(fp.read())
--- 285,292 ----
      interested in is the message headers.
      """
!     def _parsebody(self, container, fp, firstbodyline=None):
          # Consume but do not parse, the body
!         text = fp.read()
!         if firstbodyline is not None:
!             text = firstbodyline + '\n' + text
!         container.set_payload(text)