[Python-Dev] cpython: #11731: simplify/enhance parser/generator API by introducing policy objects.

Georg Brandl g.brandl at gmx.net
Mon Apr 18 20:26:36 CEST 2011


On 18.04.2011 20:00, r.david.murray wrote:

> diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst
> --- a/Doc/library/email.parser.rst
> +++ b/Doc/library/email.parser.rst
> @@ -112,8 +118,13 @@
>     :class:`~email.message.Message` (see :mod:`email.message`).  The factory will
>     be called without arguments.
>  
> -   .. versionchanged:: 3.2
> -      Removed the *strict* argument that was deprecated in 2.4.
> +   The *policy* keyword specifies a :mod:`~email.policy` object that controls a
> +   number of aspects of the parser's operation.  The default policy maintains
> +   backward compatibility.
> +
> +   .. versionchanged:: 3.3
> +      Removed the *strict* argument that was deprecated in 2.4.  Added the
> +      *policy* keyword.

Hmm, so *strict* wasn't actually removed in 3.2?

> @@ -187,12 +204,15 @@
>  
>  .. currentmodule:: email
>  
> -.. function:: message_from_string(s, _class=email.message.Message, strict=None)
> +.. function:: message_from_string(s, _class=email.message.Message, *, \
> +                                  policy=policy.default)
>  
>     Return a message object structure from a string.  This is exactly equivalent to
> -   ``Parser().parsestr(s)``.  Optional *_class* and *strict* are interpreted as
> +   ``Parser().parsestr(s)``.  *_class* and *policy* are interpreted as
>     with the :class:`Parser` class constructor.
>  
> +   .. versionchanged:: removed *strict*, added *policy*
> +

The 3.3 version is missing here.  Also, please always end version directive text
with a period.

>  .. function:: message_from_bytes(s, _class=email.message.Message, strict=None)
>  
>     Return a message object structure from a byte string.  This is exactly
> @@ -200,21 +220,27 @@
>     *strict* are interpreted as with the :class:`Parser` class constructor.
>  
>     .. versionadded:: 3.2
> +   .. versionchanged:: 3.3 removed *strict*, added *policy*

See above.

> -.. function:: message_from_file(fp, _class=email.message.Message, strict=None)
> +.. function:: message_from_file(fp, _class=email.message.Message, *, \
> +                                policy=policy.default)
>  
>     Return a message object structure tree from an open :term:`file object`.
> -   This is exactly equivalent to ``Parser().parse(fp)``.  Optional *_class*
> -   and *strict* are interpreted as with the :class:`Parser` class constructor.
> +   This is exactly equivalent to ``Parser().parse(fp)``.  *_class*
> +   and *policy* are interpreted as with the :class:`Parser` class constructor.
>  
> -.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None)
> +   .. versionchanged:: 3.3 removed *strict*, added *policy*

See above.

> +.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
> +                                       policy=policy.default)
>  
>     Return a message object structure tree from an open binary :term:`file
>     object`.  This is exactly equivalent to ``BytesParser().parse(fp)``.
> -   Optional *_class* and *strict* are interpreted as with the :class:`Parser`
> +   *_class* and *policy* are interpreted as with the :class:`Parser`
>     class constructor.
>  
>     .. versionadded:: 3.2
> +   .. versionchanged:: 3.3 removed *strict*, added *policy*

See above.

> --- /dev/null
> +++ b/Doc/library/email.policy.rst
> @@ -0,0 +1,179 @@
> +:mod:`email`: Policy Objects
> +----------------------------
> +
> +.. module:: email.policy
> +   :synopsis: Controlling the parsing and generating of messages

This file should have a ".. versionadded:: 3.3" (without further content) here.

> +The :mod:`email` package's prime focus is the handling of email messages as
> +described by the various email and MIME RFCs.  However, the general format of
> +email messages (a block of header fields each consisting of a name followed by
> +a colon followed by a value, the whole block followed by a blank line and an
> +arbitrary 'body'), is a format that has found utility outside of the realm of
> +email.  Some of these uses conform fairly closely to the main RFCs, some do
> +not.  And even when working with email, there are times when it is desirable to
> +break strict compliance with the RFCs.
> +
> +Policy objects are the mechanism used to provide the email package with the
> +flexibility to handle all these disparate use cases,

Looks like something is missing from this sentence :)

[...]

> +As an example, the following code could be used to read an email message from a
> +file on disk and pass it to the system ``sendmail`` program on a ``unix``
> +system::

Should be Unix, not ``unix``.

> +   >>> from email import msg_from_binary_file
> +   >>> from email.generator import BytesGenerator
> +   >>> import email.policy
> +   >>> from subprocess import Popen, PIPE
> +   >>> with open('mymsg.txt', 'b') as f:
> +   >>>     msg = msg_from_binary_file(f, policy=email.policy.mbox)
> +   >>> p = Popen(['sendmail', msg['To'][0].address], stdin=PIPE)
> +   >>> g = BytesGenerator(p.stdin, email.policy.policy=SMTP)

That keyword arg doesn't look right.

> +   >>> g.flatten(msg)
> +   >>> p.stdin.close()
> +   >>> rc = p.wait()

Also, if you put interactive prompts, please use them correctly ("..." prompt
and one blank line for the with block).

> +Some email package methods accept a *policy* keyword argument, allowing the
> +policy to be overridden for that method.  For example, the following code use

"uses"

> +the :meth:`email.message.Message.as_string` method to the *msg* object from the
                                                      ^^^^^^
Something is missing around here.

> +previous example and re-write it to a file using the native line separators for
> +the platform on which it is running::
> +
> +   >>> import os
> +   >>> mypolicy = email.policy.Policy(linesep=os.linesep)
> +   >>> with open('converted.txt', 'wb') as f:
> +   ...     f.write(msg.as_string(policy=mypolicy))
> +
> +Policy instances are immutable, but they can be cloned, accepting the same
> +keyword arguments as the class constructor and returning a new :class:`Policy`
> +instance that is a copy of the original but with the specified attributes
> +values changed.  For example, the following creates an SMTP policy that will
> +raise any defects detected as errors::
> +
> +   >>> strict_SMTP = email.policy.SMTP.clone(raise_on_defect=True)
> +
> +Policy objects can also be combined using the addition operator, producing a
> +policy object whose settings are a combination of the non-default values of the
> +summed objects::
> +
> +   >>> strict_SMTP = email.policy.SMTP + email.policy.strict

Interesting API :)

> +This operation is not commutative; that is, the order in which the objects are
> +added matters.  To illustrate::
> +
> +   >>> Policy = email.policy.Policy
> +   >>> apolicy = Policy(max_line_length=100) + Policy(max_line_length=80)
> +   >>> apolicy.max_line_length
> +   80
> +   >>> apolicy = Policy(max_line_length=80) + Policy(max_line_length=100)
> +   >>> apolicy.max_line_length
> +   100
> +
> +
> +.. class:: Policy(**kw)
> +
> +   The valid constructor keyword arguments are any of the attributes listed
> +   below.
> +
> +   .. attribute:: max_line_length
> +
> +      The maximum length of any line in the serialized output, not counting the
> +      end of line character(s).  Default is 78, per :rfc:`5322`.  A value of
> +      ``0`` or :const:`None` indicates that no line wrapping should be
> +      done at all.
> +
> +   .. attribute:: linesep
> +
> +      The string to be used to terminate lines in serialized output.  The
> +      default is '\\n' because that's the internal end-of-line discipline used
> +      by Python, though '\\r\\n' is required by the RFCs.  See `Policy
> +      Instances`_ for policies that use an RFC conformant linesep.  Setting it
> +      to :attr:`os.linesep` may also be useful.

These string constants are probably better off in code markup, i.e. ``'\n'``.

> +   .. attribute:: must_be_7bit
> +
> +      If :const:`True`, data output by a bytes generator is limited to ASCII
> +      characters.  If :const:`False` (the default), then bytes with the high
> +      bit set are preserved and/or allowed in certain contexts (for example,
> +      where possible a content transfer encoding of ``8bit`` will be used).
> +      String generators act as if ``must_be_7bit`` is `True` regardless of the
> +      policy in effect, since a string cannot represent non-ASCII bytes.

Please use either :const:`True` or ``True``.

> +   .. attribute:: raise_on_defect
> +
> +      If :const:`True`, any defects encountered will be raised as errors.  If
> +      :const:`False` (the default), defects will be passed to the
> +      :meth:`register_defect` method.

A short sentence that the following are methods would be nice.

> +   .. method:: handle_defect(obj, defect)
> +
> +      *obj* is the object on which to register the defect.

What kind of object is *obj*?

>  *defect* should be
> +      an instance of a  subclass of :class:`~email.errors.Defect`.
> +      If :attr:`raise_on_defect`
> +      is ``True`` the defect is raised as an exception.  Otherwise *obj* and
> +      *defect* are passed to :meth:`register_defect`.  This method is intended
> +      to be called by parsers when they encounter defects, and will not be
> +      called by code that uses the email library unless that code is
> +      implementing an alternate parser.
> +
> +   .. method:: register_defect(obj, defect)
> +
> +      *obj* is the object on which to register the defect.  *defect* should be
> +      a subclass of :class:`~email.errors.Defect`.  This method is part of the
> +      public API so that custom ``Policy`` subclasses can implement alternate
> +      handling of defects.  The default implementation calls the ``append``
> +      method of the ``defects`` attribute of *obj*.
> +
> +   .. method:: clone(obj, *kw):
> +
> +      Return a new :class:`Policy` instance whose attributes have the same
> +      values as the current instance, except where those attributes are
> +      given new values by the keyword arguments.
> +
> +
> +Policy Instances
> +................

We're usually using "^^^^" for underlining this level of headings, but it's not
really important.

> +The following instances of :class:`Policy` provide defaults suitable for
> +specific common application domains.

Indentation switches to 4 spaces below here...

> +.. data:: default
> +
> +    An instance of :class:`Policy` with all defaults unchanged.
> +
> +.. data:: SMTP
> +
> +    Output serialized from a message will conform to the email and SMTP
> +    RFCs.  The only changed attribute is :attr:`linesep`, which is set to
> +    ``\r\n``.
> +
> +.. data:: HTTP
> +
> +    Suitable for use when serializing headers for use in HTTP traffic.
> +    :attr:`linesep` is set to ``\r\n``, and :attr:`max_line_length` is set to
> +    :const:`None` (unlimited).
> +
> +.. data:: strict
> +
> +    :attr:`raise_on_defect` is set to :const:`True`.

Sorry for the long review.

Georg



More information about the Python-Dev mailing list