RFC822, PKG-INFO and the Description field

Hi, I am currently fixing a bug in Distutils: http://bugs.python.org/issue1923 this bugs makes a Description field like: """ Text:: a literal python block:: >>> import this """ Transformed into : """ Text:: a literal python block::
import this """
Which is fine for RFC822 compliancy but sucks for reST if someone wants to parse it back. There's another problem: empty lines. For instance: """ Description: Text:: a literal python block:: >>> import this """ Will not be parseable with rfc822 because the first empty line ends the header. IOW we need to encode that multi-line field differently. I want to take the chance that we are changing PEP 345, to introduce a smarter marker in 1.2, that will add a character (:) after the 8 spaces to avoid losing empty lines: """ Description: Text:: : :a literal python block:: : : >>> import this """ So we are able to unparse it. Thoughts ? Regards Tarek

On Sat, 2009-12-05 at 03:49 +0100, Tarek Ziadé wrote:
Hi,
I am currently fixing a bug in Distutils: http://bugs.python.org/issue1923
this bugs makes a Description field like:
""" Text:: a literal python block:: >>> import this """
Transformed into :
""" Text:: a literal python block::
import this """
Which is fine for RFC822 compliancy but sucks for reST if someone wants to parse it back.
There's another problem: empty lines.
For instance:
""" Description: Text::
a literal python block::
>>> import this """
Will not be parseable with rfc822 because the first empty line ends the header.
IOW we need to encode that multi-line field differently.
I want to take the chance that we are changing PEP 345, to introduce a smarter marker in 1.2, that will add a character (:) after the 8 spaces to avoid losing empty lines:
""" Description: Text:: : :a literal python block:: : : >>> import this """
So we are able to unparse it.
Thoughts ?
How about turning that file into a real mime message instead of just a set of pseudo mime headers with some pseudo encodings for multiline stuff. That way the long description could be the body of that message, no more messy recoding needed. As far as i can tell, all the other optional fields are designed to fit single lines anyway. IMHO its a perfect match. If the idea is liked, i'd take 3-4 Hours to write a new DistributionMetadata class that handles the new format as well as backward compatibility (i suppose a new version number is needed anyway for some of the other additions) Regards Ronny

On Sun, Dec 6, 2009 at 5:37 PM, Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> wrote: [..]
How about turning that file into a real mime message instead of just a set of pseudo mime headers with some pseudo encodings for multiline stuff.
That way the long description could be the body of that message, no more messy recoding needed.
As far as i can tell, all the other optional fields are designed to fit single lines anyway.
The problem is, we may have in the future more multi-line fields so I think we should not use a message-like body. It's not a big problem if PKG-INFO looks messy, as long as we provide APIs to write *and* read it. Notice that there is not official consumers for this file yet as far as I know, PyPI reads the metadata from a form that is pushed by the register command and stores it in its own format. OTHO we could drop RFC 822 completely and go for a simpler format like json for 1.2.. I am not sure how many third party applications this will impact but I don't think there are so many. In any case, they will have to adapt themselves for 1.2, so maybe keeping a RFC 822-like format is less annoying for them. Regards Tarek

On Sun, Dec 06, 2009 at 09:22:32PM +0100, Tarek Ziadé wrote:
On Sun, Dec 6, 2009 at 5:37 PM, Ronny Pfannschmidt <Ronny.Pfannschmidt@gmx.de> wrote: [..]
How about turning that file into a real mime message instead of just a set of pseudo mime headers with some pseudo encodings for multiline stuff.
That way the long description could be the body of that message, no more messy recoding needed.
As far as i can tell, all the other optional fields are designed to fit single lines anyway.
The problem is, we may have in the future more multi-line fields so I think we should not use a message-like body.
Could easily hack that in using multipart/alternative messages, given each alternative a meaning. :-)
OTHO we could drop RFC 822 completely and go for a simpler format like json for 1.2..
That would be a less strange option. If the complete compatibility break does no harm then it might be a good option. But I'm not sure which are all the tools that would be consuming this, it might be risky. Note that stuffing it in multipart/alternative messages suffers the same problem, only your encoding/decoding doesn't. PS: As another (possibly bad) alternative you could encode it in base64. Same compatibility problems however, only your proposal doesn't have those AFAIK. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
participants (3)
-
Floris Bruynooghe
-
Ronny Pfannschmidt
-
Tarek Ziadé