[Python-checkins] python/dist/src/Doc/lib email.tex, 1.17, 1.18 emailencoders.tex, 1.4, 1.5 emailexc.tex, 1.3, 1.4 emailmessage.tex, 1.15, 1.16 emailmimebase.tex, 1.3, 1.4 emailparser.tex, 1.9, 1.10 emailutil.tex, 1.8, 1.9

bwarsaw at users.sourceforge.net bwarsaw at users.sourceforge.net
Sun Oct 3 05:16:21 CEST 2004

Update of /cvsroot/python/python/dist/src/Doc/lib
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32603/Doc/lib

Modified Files:
	email.tex emailencoders.tex emailexc.tex emailmessage.tex 
	emailmimebase.tex emailparser.tex emailutil.tex 
Log Message:
Big email 3.0 API changes, with updated unit tests and documentation.
Briefly (from the NEWS file):

- Updates for the email package:
  + All deprecated APIs that in email 2.x issued warnings have been removed:
    _encoder argument to the MIMEText constructor, Message.add_payload(),
    Utils.dump_address_pair(), Utils.decode(), Utils.encode()
  + New deprecations: Generator.__call__(), Message.get_type(),
    Message.get_main_type(), Message.get_subtype(), the 'strict' argument to
    the Parser constructor.  These will be removed in email 3.1.
  + Support for Python earlier than 2.3 has been removed (see PEP 291).
  + All defect classes have been renamed to end in 'Defect'.
  + Some FeedParser fixes; also a MultipartInvariantViolationDefect will be
    added to messages that claim to be multipart but really aren't.
  + Updates to documentation.

Index: email.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/email.tex,v
retrieving revision 1.17
retrieving revision 1.18
diff -u -d -r1.17 -r1.18
--- email.tex	29 Jan 2003 05:10:27 -0000	1.17
+++ email.tex	3 Oct 2004 03:16:17 -0000	1.18
@@ -1,5 +1,5 @@
-% Copyright (C) 2001,2002 Python Software Foundation
-% Author: barry at zope.com (Barry Warsaw)
+% Copyright (C) 2001-2004 Python Software Foundation
+% Author: barry at python.org (Barry Warsaw)
 \section{\module{email} ---
 	 An email and MIME handling package}
@@ -7,8 +7,8 @@
 \modulesynopsis{Package supporting the parsing, manipulating, and
     generating email messages, including MIME documents.}
-\moduleauthor{Barry A. Warsaw}{barry at zope.com}
-\sectionauthor{Barry A. Warsaw}{barry at zope.com}
+\moduleauthor{Barry A. Warsaw}{barry at python.org}
+\sectionauthor{Barry A. Warsaw}{barry at python.org}
@@ -22,7 +22,7 @@
 function of the \refmodule{smtplib} module.  The \module{email}
 package attempts to be as RFC-compliant as possible, supporting in
 addition to \rfc{2822}, such MIME-related RFCs as
-\rfc{2045}-\rfc{2047}, and \rfc{2231}.
+\rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
 The primary distinguishing feature of the \module{email} package is
 that it splits the parsing and generating of email messages from the
@@ -79,7 +79,7 @@
-\subsection{Exception classes}
+\subsection{Exception and Defect classes}
 \subsection{Miscellaneous utilities}
@@ -88,14 +88,41 @@
-\subsection{Differences from \module{email} v1 (up to Python 2.2.1)}
+\subsection{Package History}
 Version 1 of the \module{email} package was bundled with Python
 releases up to Python 2.2.1.  Version 2 was developed for the Python
 2.3 release, and backported to Python 2.2.2.  It was also available as
-a separate distutils based package.  \module{email} version 2 is
-almost entirely backward compatible with version 1, with the
-following differences:
+a separate distutils-based package, and is compatible back to Python 2.1.
+\module{email} version 3.0 was released with Python 2.4 and as a separate
+distutils-based package.  It is compatible back to Python 2.3.
+Here are the differences between \module{email} version 3 and version 2:
+\item The \class{FeedParser} class was introduced, and the \class{Parser}
+      class was implemented in terms of the \class{FeedParser}.  All parsing
+      there for is non-strict, and parsing will make a best effort never to
+      raise an exception.  Problems found while parsing messages are stored in
+      the message's \var{defect} attribute.
+\item All aspects of the API which raised \exception{DeprecationWarning}s in
+      version 2 have been removed.  These include the \var{_encoder} argument
+      to the \class{MIMEText} constructor, the \method{Message.add_payload()}
+      method, the \function{Utils.dump_address_pair()} function, and the
+      functions \function{Utils.decode()} and \function{Utils.encode()}.
+\item New \exception{DeprecationWarning}s have been added to:
+      \method{Generator.__call__()}, \method{Message.get_type()},
+      \method{Message.get_main_type()}, \method{Message.get_subtype()}, and
+      the \var{strict} argument to the \class{Parser} class.  These are
+      expected to be removed in email 3.1.
+\item Support for Pythons earlier than 2.3 has been removed.
+Here are the differences between \module{email} version 2 and version 1:
 \item The \module{email.Header} and \module{email.Charset} modules

Index: emailencoders.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailencoders.tex,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -d -r1.4 -r1.5
--- emailencoders.tex	1 Oct 2002 04:33:14 -0000	1.4
+++ emailencoders.tex	3 Oct 2004 03:16:17 -0000	1.5
@@ -8,11 +8,11 @@
 The \module{email} package provides some convenient encodings in its
 \module{Encoders} module.  These encoders are actually used by the
-\class{MIMEImage} and \class{MIMEText} class constructors to provide default
-encodings.  All encoder functions take exactly one argument, the
-message object to encode.  They usually extract the payload, encode
-it, and reset the payload to this newly encoded value.  They should also
-set the \mailheader{Content-Transfer-Encoding} header as appropriate.
+\class{MIMEAudio} and \class{MIMEImage} class constructors to provide default
+encodings.  All encoder functions take exactly one argument, the message
+object to encode.  They usually extract the payload, encode it, and reset the
+payload to this newly encoded value.  They should also set the
+\mailheader{Content-Transfer-Encoding} header as appropriate.
 Here are the encoding functions provided:

Index: emailexc.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailexc.tex,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -d -r1.3 -r1.4
--- emailexc.tex	1 Oct 2002 01:05:52 -0000	1.3
+++ emailexc.tex	3 Oct 2004 03:16:17 -0000	1.4
@@ -52,3 +52,36 @@
 if the \method{attach()} method is called on an instance of a class
 derived from \class{MIMENonMultipart} (e.g. \class{MIMEImage}).
+Here's the list of the defects that the \class{FeedParser} can find while
+parsing messages.  Note that the defects are added to the message where the
+problem was found, so for example, if a message nested inside a
+\mimetype{multipart/alternative} had a malformed header, that nested message
+object would have a defect, but the containing messages would not.
+All defect classes are subclassed from \class{email.Errors.MessageDefect}, but
+this class is \emph{not} an exception!
+\versionadded[All the defect classes were added]{2.4}
+\item \class{NoBoundaryInMultipartDefect} -- A message claimed to be a
+      multipart, but had no \mimetype{boundary} parameter.
+\item \class{StartBoundaryNotFoundDefect} -- The start boundary claimed in the
+      \mailheader{Content-Type} header was never found.
+\item \class{FirstHeaderLineIsContinuationDefect} -- The message had a
+      continuation line as its first header line.
+\item \class{MisplacedEnvelopeHeaderDefect} - A ``Unix From'' header was found
+      in the middle of a header block.
+\item \class{MalformedHeaderDefect} -- A header was found that was missing a
+      colon, or was otherwise malformed.
+\item \class{MultipartInvariantViolationDefect} -- A message claimed to be a
+      \mimetype{multipart}, but no subparts were found.  Note that when a
+      message has this defect, its \method{is_multipart()} method may return
+      false even though its content type claims to be \mimetype{multipart}.

Index: emailmessage.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailmessage.tex,v
retrieving revision 1.15
retrieving revision 1.16
diff -u -d -r1.15 -r1.16
--- emailmessage.tex	28 Sep 2004 02:56:45 -0000	1.15
+++ emailmessage.tex	3 Oct 2004 03:16:17 -0000	1.16
@@ -359,13 +359,16 @@
 \code{VALUE} to be encoded in the \code{us-ascii} charset.  You can
 usually ignore \code{LANGUAGE}.
-Your application should be prepared to deal with 3-tuple return
-values, and can convert the parameter to a Unicode string like so:
+If your application doesn't care whether the parameter was encoded as in
+\rfc{2231}, you can collapse the parameter value by calling
+\function{email.Utils.collapse_rfc2231_value()}, passing in the return value
+from \method{get_param()}.  This will return a suitably decoded Unicode string
+whn the value is a tuple, or the original string unquoted if it isn't.  For
-param = msg.get_param('foo')
-if isinstance(param, tuple):
-    param = unicode(param[2], param[0] or 'us-ascii')
+rawparam = msg.get_param('foo')
+param = email.Utils.collapse_rfc2231_value(rawparam)
 In any case, the parameter value (either the returned string, or the
@@ -549,32 +552,21 @@
 set the \var{epilogue} to the empty string.
-\subsubsection{Deprecated methods}
-The following methods are deprecated in \module{email} version 2.
-They are documented here for completeness.
+The \var{defects} attribute contains a list of all the problems found when
+parsing this message.  See \refmodule{email.Errors} for a detailed description
+of the possible parsing defects.
-Add \var{payload} to the message object's existing payload.  If, prior
-to calling this method, the object's payload was \code{None}
-(i.e. never before set), then after this method is called, the payload
-will be the argument \var{payload}.
-If the object's payload was already a list
-(i.e. \method{is_multipart()} returns \code{True}), then \var{payload} is
-appended to the end of the existing payload list.
+\subsubsection{Deprecated methods}
-For any other type of existing payload, \method{add_payload()} will
-transform the new payload into a list consisting of the old payload
-and \var{payload}, but only if the document is already a MIME
-multipart document.  This condition is satisfied if the message's
-\mailheader{Content-Type} header's main type is either
-\mimetype{multipart}, or there is no \mailheader{Content-Type}
-header.  In any other situation,
-\exception{MultipartConversionError} is raised.
+\versionchanged[The \method{add_payload()} method was removed; use the
+\method{attach()} method instead]{2.4}
-\deprecated{2.2.2}{Use the \method{attach()} method instead.}
+The following methods are deprecated.  They are documented here for
 Return the message's content type, as a string of the form

Index: emailmimebase.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailmimebase.tex,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -d -r1.3 -r1.4
--- emailmimebase.tex	11 Mar 2003 05:03:25 -0000	1.3
+++ emailmimebase.tex	3 Oct 2004 03:16:17 -0000	1.4
@@ -142,9 +142,7 @@
 to \mimetype{rfc822}.
-\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{,
-    _charset\optional{, _encoder}}}}
+\begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, _charset}}}
 A subclass of \class{MIMENonMultipart}, the \class{MIMEText} class is
 used to create MIME objects of major type \mimetype{text}.
 \var{_text} is the string for the payload.  \var{_subtype} is the
@@ -153,6 +151,7 @@
 \class{MIMENonMultipart} constructor; it defaults to \code{us-ascii}.  No
 guessing or encoding is performed on the text data.
-\deprecated{2.2.2}{The \var{_encoding} argument has been deprecated.
-Encoding now happens implicitly based on the \var{_charset} argument.}
+\versionchanged[The previously deprecated \var{_encoding} argument has
+been removed.  Encoding happens implicitly based on the \var{_charset}

Index: emailparser.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailparser.tex,v
retrieving revision 1.9
retrieving revision 1.10
diff -u -d -r1.9 -r1.10
--- emailparser.tex	24 Feb 2004 20:58:10 -0000	1.9
+++ emailparser.tex	3 Oct 2004 03:16:17 -0000	1.10
@@ -18,29 +18,79 @@
 \method{is_multipart()} method, and the subparts can be accessed via
 the \method{get_payload()} and \method{walk()} methods.
+There are actually two parser interfaces available for use, the classic
+\class{Parser} API and the incremental \class{FeedParser} API.  The classic
+\class{Parser} API is fine if you have the entire text of the message in
+memory as a string, or if the entire message lives in a file on the file
+system.  \class{FeedParser} is more appropriate for when you're reading the
+message from a stream which might block waiting for more input (e.g. reading
+an email message from a socket).  The \class{FeedParser} can consume and parse
+the message incrementally, and only returns the root object when you close the
+parser\footnote{As of email package version 3.0, introduced in
+Python 2.4, the classic \class{Parser} was re-implemented in terms of the
+\class{FeedParser}, so the semantics and results are identical between the two
 Note that the parser can be extended in limited ways, and of course
 you can implement your own parser completely from scratch.  There is
 no magical connection between the \module{email} package's bundled
 parser and the \class{Message} class, so your custom parser can create
 message object trees any way it finds necessary.
-The primary parser class is \class{Parser} which parses both the
-headers and the payload of the message.  In the case of
-\mimetype{multipart} messages, it will recursively parse the body of
-the container message.  Two modes of parsing are supported,
-\emph{strict} parsing, which will usually reject any non-RFC compliant
-message, and \emph{lax} parsing, which attempts to adjust for common
-MIME formatting problems.
+\subsubsection{FeedParser API}
-The \module{email.Parser} module also provides a second class, called
+The \class{FeedParser} provides an API that is conducive to incremental
+parsing of email messages, such as would be necessary when reading the text of
+an email message from a source that can block (e.g. a socket).  The
+\class{FeedParser} can of course be used to parse an email message fully
+contained in a string or a file, but the classic \class{Parser} API may be
+more convenient for such use cases.  The semantics and results of the two
+parser APIs are identical.
+The \class{FeedParser}'s API is simple; you create an instance, feed it a
+bunch of text until there's no more to feed it, then close the parser to
+retrieve the root message object.  The \class{FeedParser} is extremely
+accurate when parsing standards-compliant messages, and it does a very good
+job of parsing non-compliant messages, providing information about how a
+message was deemed broken.  It will populate a message object's \var{defects}
+attribute with a list of any problems it found in a message.  See the
+\refmodule{email.Errors} module for the list of defects that it can find.
+Here is the API for the \class{FeedParser}:
+Create a \class{FeedParser} instance.  Optional \var{_factory} is a
+no-argument callable that will be called whenever a new message object is
+needed.  It defaults to the \class{email.Message.Message} class.
+Feed the \class{FeedParser} some more data.  \var{data} should be a
+string containing one or more lines.  The lines can be partial and the
+\class{FeedParser} will stitch such partial lines together properly.  The
+lines in the string can have any of the common three line endings, carriage
+return, newline, or carriage return and newline (they can even be mixed).
+Closing a \class{FeedParser} completes the parsing of all previously fed data,
+and returns the root message object.  It is undefined what happens if you feed
+more data to a closed \class{FeedParser}.
+\subsubsection{Parser class API}
+The \class{Parser} provides an API that can be used to parse a message when
+the complete contents of the message are available in a string or file.  The
+\module{email.Parser} module also provides a second class, called
 \class{HeaderParser} which can be used if you're only interested in
 the headers of the message. \class{HeaderParser} can be much faster in
 these situations, since it does not attempt to parse the message body,
 instead setting the payload to the raw body as a string.
 \class{HeaderParser} has the same API as the \class{Parser} class.
-\subsubsection{Parser class API}
 \begin{classdesc}{Parser}{\optional{_class\optional{, strict}}}
 The constructor for the \class{Parser} class takes an optional
 argument \var{_class}.  This must be a callable factory (such as a
@@ -49,19 +99,14 @@
 \refmodule{email.Message}).  The factory will be called without
-The optional \var{strict} flag specifies whether strict or lax parsing
-should be performed.  Normally, when things like MIME terminating
-boundaries are missing, or when messages contain other formatting
-problems, the \class{Parser} will raise a
-\exception{MessageParseError}.  However, when lax parsing is enabled,
-the \class{Parser} will attempt to work around such broken formatting
-to produce a usable message structure (this doesn't mean
-\exception{MessageParseError}s are never raised; some ill-formatted
-messages just can't be parsed).  The \var{strict} flag defaults to
-\code{False} since lax parsing usually provides the most convenient
+The optional \var{strict} flag is ignored.  \deprecated{2.4}{Because the
+\class{Parser} class is a backward compatible API wrapper around the
+new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively
+non-strict.  You should simply stop passing a \var{strict} flag to the
+\class{Parser} constructor.}
 \versionchanged[The \var{strict} flag was added]{2.2.2}
+\versionchanged[The \var{strict} flag was deprecated]{2.4}
 The other public \class{Parser} methods are:
@@ -149,4 +194,13 @@
       object containing a list payload of length 1.  Their
       \method{is_multipart()} method will return \code{True}.  The
       single element in the list payload will be a sub-message object.
+\item Some non-standards compliant messages may not be internally consistent
+      about their \mimetype{multipart}-edness.  Such messages may have a
+      \mailheader{Content-Type} header of type \mimetype{multipart}, but their
+      \method{is_multipart()} method may return \code{False}.  If such
+      messages were parsed with the \class{FeedParser}, they will have an
+      instance of the \class{MultipartInvariantViolationDefect} class in their
+      \var{defects} attribute list.  See \refmodule{email.Errors} for
+      details.

Index: emailutil.tex
RCS file: /cvsroot/python/python/dist/src/Doc/lib/emailutil.tex,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -d -r1.8 -r1.9
--- emailutil.tex	1 Oct 2002 04:33:16 -0000	1.8
+++ emailutil.tex	3 Oct 2004 03:16:17 -0000	1.9
@@ -119,24 +119,33 @@
 string is encoded using the empty string for \var{language}.
+\begin{funcdesc}{collapse_rfc2231_value}{value\optional{, errors\optional{,
+    fallback_charset}}}
+When a header parameter is encoded in \rfc{2231} format,
+\method{Message.get_param()} may return a 3-tuple containing the character
+set, language, and value.  \function{collapse_rfc2231_value()} turns this into
+a unicode string.  Optional \var{errors} is passed to the \var{errors}
+argument of the built-in \function{unicode()} function; it defaults to
+\code{replace}.  Optional \var{fallback_charset} specifies the character set
+to use if the one in the \rfc{2231} header is not known by Python; it defaults
+to \code{us-ascii}.
+For convenience, if the \var{value} passed to
+\function{collapse_rfc2231_value()} is not a tuple, it should be a string and
+it is returned unquoted.
 Decode parameters list according to \rfc{2231}.  \var{params} is a
 sequence of 2-tuples containing elements of the form
 \code{(content-type, string-value)}.
-The following functions have been deprecated:
-\deprecated{2.2.2}{Use \function{formataddr()} instead.}
-\deprecated{2.2.2}{Use \method{Header.decode_header()} instead.}
+\versionchanged[The \function{dump_address_pair()} function has been removed;
+use \function{formataddr()} instead.]{2.4}
-\begin{funcdesc}{encode}{s\optional{, charset\optional{, encoding}}}
-\deprecated{2.2.2}{Use \method{Header.encode()} instead.}
+\versionchanged[The \function{decode()} function has been removed; use the
+\method{Header.decode_header()} method instead.]{2.4}
+\versionchanged[The \function{encode()} function has been removed; use the
+\method{Header.encode()} method instead.]{2.4}

More information about the Python-checkins mailing list