[Python-Dev] cpython: #11731: simplify/enhance parser/generator API by introducing policy objects.
Georg Brandl
g.brandl at gmx.net
Mon Apr 18 20:26:36 CEST 2011
On 18.04.2011 20:00, r.david.murray wrote:
> diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst
> --- a/Doc/library/email.parser.rst
> +++ b/Doc/library/email.parser.rst
> @@ -112,8 +118,13 @@
> :class:`~email.message.Message` (see :mod:`email.message`). The factory will
> be called without arguments.
>
> - .. versionchanged:: 3.2
> - Removed the *strict* argument that was deprecated in 2.4.
> + The *policy* keyword specifies a :mod:`~email.policy` object that controls a
> + number of aspects of the parser's operation. The default policy maintains
> + backward compatibility.
> +
> + .. versionchanged:: 3.3
> + Removed the *strict* argument that was deprecated in 2.4. Added the
> + *policy* keyword.
Hmm, so *strict* wasn't actually removed in 3.2?
> @@ -187,12 +204,15 @@
>
> .. currentmodule:: email
>
> -.. function:: message_from_string(s, _class=email.message.Message, strict=None)
> +.. function:: message_from_string(s, _class=email.message.Message, *, \
> + policy=policy.default)
>
> Return a message object structure from a string. This is exactly equivalent to
> - ``Parser().parsestr(s)``. Optional *_class* and *strict* are interpreted as
> + ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
> with the :class:`Parser` class constructor.
>
> + .. versionchanged:: removed *strict*, added *policy*
> +
The 3.3 version is missing here. Also, please always end version directive text
with a period.
> .. function:: message_from_bytes(s, _class=email.message.Message, strict=None)
>
> Return a message object structure from a byte string. This is exactly
> @@ -200,21 +220,27 @@
> *strict* are interpreted as with the :class:`Parser` class constructor.
>
> .. versionadded:: 3.2
> + .. versionchanged:: 3.3 removed *strict*, added *policy*
See above.
> -.. function:: message_from_file(fp, _class=email.message.Message, strict=None)
> +.. function:: message_from_file(fp, _class=email.message.Message, *, \
> + policy=policy.default)
>
> Return a message object structure tree from an open :term:`file object`.
> - This is exactly equivalent to ``Parser().parse(fp)``. Optional *_class*
> - and *strict* are interpreted as with the :class:`Parser` class constructor.
> + This is exactly equivalent to ``Parser().parse(fp)``. *_class*
> + and *policy* are interpreted as with the :class:`Parser` class constructor.
>
> -.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None)
> + .. versionchanged:: 3.3 removed *strict*, added *policy*
See above.
> +.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
> + policy=policy.default)
>
> Return a message object structure tree from an open binary :term:`file
> object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
> - Optional *_class* and *strict* are interpreted as with the :class:`Parser`
> + *_class* and *policy* are interpreted as with the :class:`Parser`
> class constructor.
>
> .. versionadded:: 3.2
> + .. versionchanged:: 3.3 removed *strict*, added *policy*
See above.
> --- /dev/null
> +++ b/Doc/library/email.policy.rst
> @@ -0,0 +1,179 @@
> +:mod:`email`: Policy Objects
> +----------------------------
> +
> +.. module:: email.policy
> + :synopsis: Controlling the parsing and generating of messages
This file should have a ".. versionadded:: 3.3" (without further content) here.
> +The :mod:`email` package's prime focus is the handling of email messages as
> +described by the various email and MIME RFCs. However, the general format of
> +email messages (a block of header fields each consisting of a name followed by
> +a colon followed by a value, the whole block followed by a blank line and an
> +arbitrary 'body'), is a format that has found utility outside of the realm of
> +email. Some of these uses conform fairly closely to the main RFCs, some do
> +not. And even when working with email, there are times when it is desirable to
> +break strict compliance with the RFCs.
> +
> +Policy objects are the mechanism used to provide the email package with the
> +flexibility to handle all these disparate use cases,
Looks like something is missing from this sentence :)
[...]
> +As an example, the following code could be used to read an email message from a
> +file on disk and pass it to the system ``sendmail`` program on a ``unix``
> +system::
Should be Unix, not ``unix``.
> + >>> from email import msg_from_binary_file
> + >>> from email.generator import BytesGenerator
> + >>> import email.policy
> + >>> from subprocess import Popen, PIPE
> + >>> with open('mymsg.txt', 'b') as f:
> + >>> msg = msg_from_binary_file(f, policy=email.policy.mbox)
> + >>> p = Popen(['sendmail', msg['To'][0].address], stdin=PIPE)
> + >>> g = BytesGenerator(p.stdin, email.policy.policy=SMTP)
That keyword arg doesn't look right.
> + >>> g.flatten(msg)
> + >>> p.stdin.close()
> + >>> rc = p.wait()
Also, if you put interactive prompts, please use them correctly ("..." prompt
and one blank line for the with block).
> +Some email package methods accept a *policy* keyword argument, allowing the
> +policy to be overridden for that method. For example, the following code use
"uses"
> +the :meth:`email.message.Message.as_string` method to the *msg* object from the
^^^^^^
Something is missing around here.
> +previous example and re-write it to a file using the native line separators for
> +the platform on which it is running::
> +
> + >>> import os
> + >>> mypolicy = email.policy.Policy(linesep=os.linesep)
> + >>> with open('converted.txt', 'wb') as f:
> + ... f.write(msg.as_string(policy=mypolicy))
> +
> +Policy instances are immutable, but they can be cloned, accepting the same
> +keyword arguments as the class constructor and returning a new :class:`Policy`
> +instance that is a copy of the original but with the specified attributes
> +values changed. For example, the following creates an SMTP policy that will
> +raise any defects detected as errors::
> +
> + >>> strict_SMTP = email.policy.SMTP.clone(raise_on_defect=True)
> +
> +Policy objects can also be combined using the addition operator, producing a
> +policy object whose settings are a combination of the non-default values of the
> +summed objects::
> +
> + >>> strict_SMTP = email.policy.SMTP + email.policy.strict
Interesting API :)
> +This operation is not commutative; that is, the order in which the objects are
> +added matters. To illustrate::
> +
> + >>> Policy = email.policy.Policy
> + >>> apolicy = Policy(max_line_length=100) + Policy(max_line_length=80)
> + >>> apolicy.max_line_length
> + 80
> + >>> apolicy = Policy(max_line_length=80) + Policy(max_line_length=100)
> + >>> apolicy.max_line_length
> + 100
> +
> +
> +.. class:: Policy(**kw)
> +
> + The valid constructor keyword arguments are any of the attributes listed
> + below.
> +
> + .. attribute:: max_line_length
> +
> + The maximum length of any line in the serialized output, not counting the
> + end of line character(s). Default is 78, per :rfc:`5322`. A value of
> + ``0`` or :const:`None` indicates that no line wrapping should be
> + done at all.
> +
> + .. attribute:: linesep
> +
> + The string to be used to terminate lines in serialized output. The
> + default is '\\n' because that's the internal end-of-line discipline used
> + by Python, though '\\r\\n' is required by the RFCs. See `Policy
> + Instances`_ for policies that use an RFC conformant linesep. Setting it
> + to :attr:`os.linesep` may also be useful.
These string constants are probably better off in code markup, i.e. ``'\n'``.
> + .. attribute:: must_be_7bit
> +
> + If :const:`True`, data output by a bytes generator is limited to ASCII
> + characters. If :const:`False` (the default), then bytes with the high
> + bit set are preserved and/or allowed in certain contexts (for example,
> + where possible a content transfer encoding of ``8bit`` will be used).
> + String generators act as if ``must_be_7bit`` is `True` regardless of the
> + policy in effect, since a string cannot represent non-ASCII bytes.
Please use either :const:`True` or ``True``.
> + .. attribute:: raise_on_defect
> +
> + If :const:`True`, any defects encountered will be raised as errors. If
> + :const:`False` (the default), defects will be passed to the
> + :meth:`register_defect` method.
A short sentence that the following are methods would be nice.
> + .. method:: handle_defect(obj, defect)
> +
> + *obj* is the object on which to register the defect.
What kind of object is *obj*?
> *defect* should be
> + an instance of a subclass of :class:`~email.errors.Defect`.
> + If :attr:`raise_on_defect`
> + is ``True`` the defect is raised as an exception. Otherwise *obj* and
> + *defect* are passed to :meth:`register_defect`. This method is intended
> + to be called by parsers when they encounter defects, and will not be
> + called by code that uses the email library unless that code is
> + implementing an alternate parser.
> +
> + .. method:: register_defect(obj, defect)
> +
> + *obj* is the object on which to register the defect. *defect* should be
> + a subclass of :class:`~email.errors.Defect`. This method is part of the
> + public API so that custom ``Policy`` subclasses can implement alternate
> + handling of defects. The default implementation calls the ``append``
> + method of the ``defects`` attribute of *obj*.
> +
> + .. method:: clone(obj, *kw):
> +
> + Return a new :class:`Policy` instance whose attributes have the same
> + values as the current instance, except where those attributes are
> + given new values by the keyword arguments.
> +
> +
> +Policy Instances
> +................
We're usually using "^^^^" for underlining this level of headings, but it's not
really important.
> +The following instances of :class:`Policy` provide defaults suitable for
> +specific common application domains.
Indentation switches to 4 spaces below here...
> +.. data:: default
> +
> + An instance of :class:`Policy` with all defaults unchanged.
> +
> +.. data:: SMTP
> +
> + Output serialized from a message will conform to the email and SMTP
> + RFCs. The only changed attribute is :attr:`linesep`, which is set to
> + ``\r\n``.
> +
> +.. data:: HTTP
> +
> + Suitable for use when serializing headers for use in HTTP traffic.
> + :attr:`linesep` is set to ``\r\n``, and :attr:`max_line_length` is set to
> + :const:`None` (unlimited).
> +
> +.. data:: strict
> +
> + :attr:`raise_on_defect` is set to :const:`True`.
Sorry for the long review.
Georg
More information about the Python-Dev
mailing list