[Python-checkins] r77056 - peps/trunk/pep-0345.txt

Tue Dec 29 23:08:28 CET 2009

On Tue, Dec 29, 2009 at 3:30 PM, R. David Murray <rdmurray at bitdance.com> wrote:
> On Fri, 25 Dec 2009 08:29:22 -0500, tarek.ziade <python-checkins at python.org> wrote:
>
> tarek.ziade <python-checkins at python.org> wrote:
>> +To support empty lines and lines with indentation with respect to
>> +the RFC 822 format, any new line has to be suffixed by 7 spaces
>> +followed by a pipe (`|`) char. As a result, the Description field is
>> +encoded into a folded field that can be interpreted by RFC822
>> +parser [2]_.
>> +
>>  Example::
>>
> [...]
>> +    Description: This project provides powerful math functions
>> +            |For example, you can use `sum()` to sum numbers:
>> +            |
>> +            |Example::
>> +            |
>> +            |    >>> sum(1, 2)
>> +            |    3
>> +            |
>> +
>> +This encoding implies that any occurences of ``\n |`` have to be replaced
>> +by ``\n`` when the field is unfolded using a RFC822 reader.
>
> Actually, a properly operating RFC[2]822 parser will unfold the header
> by removing the CFLF.  So the correct method of restoring the line
> breaks would be to replace the seven-spaces-plus-'|' with a \n.
> (See RFC2822, section 2.2.3).
>
> Unfortunately, Python's email package's parser is not (currently) an
> example of a standards conformant parser in that it does not have a mode
> that returns the 'unfolded' header.  So if you are using Python's email
> package to parse one of these headers, you would need to access the raw
> header data (that has the line ends in it) and replace the seven spaces
> plus '|' with the null string to recover your original data.
>
> It is also not clear that you can use the current email package to produce
> headers in the format you specify, though that may not be an issue.

Yes, at first this issue was unclear to me, because I used the rfc822
module from the stdlib and this will translate any numbers of spaces
used as the WSP into *one¨ single space, meaning that to decode the
header, I couldn't use a bijective transformation. Then I realized
that the rfc822 was to be removed, and that the email package behaves
correcty.

Here's how Distutils will do:

>>> import email
>>> WSP = ' '*7+'|'
>>> def encode(str):
...     return str.replace('\n', '\n'+WSP)
...
>>> def decode(str):
...     return str.replace('\n'+WSP, '\n')
...
>>> description = """Description: here is
...
... some example::
...
...     >>> 1 + 1
...     2
... """
>>> msg = email.message_from_string(encode(description))
>>> print msg['description']
here is
       |
       |some example::
       |
       |    >>> 1 + 1
       |    2
       |
>>> print decode(msg['description'])
here is

some example::

    >>> 1 + 1
    2

I'll fix the PEP accordingly,

Thanks for the feedback

Regards,
Tarek

-- 
Tarek Ziadé | http://ziade.org