Re: [Distutils] pkginfo python 3 port
On 2010-05-27, at 10:06 AM, Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Sridhar Ratnakumar wrote:
For a start, how about this patch? http://gist.github.com/415137
`email.parser` is available till 2.5; not sure about <=2.4 though.
Your patch does break with Python 2.4, which I would like to continue supporting. Maybe we can add some more conditional imports and glue functions? Something like the attached patch.
Sounds good.
BTW, your patch also breaks a unit test, due to differences in behavior between the rfc822 parser and the email one:
====================================================================== FAIL: test_parse_Description (pkginfo.tests.test_distribution.DistributionTests) - ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tseaver/projects/parcel/src/pkginfo/pkginfo/tests/test_distribution.py", line 104, in test_parse_Description 'This package enables integration with\n' AssertionError: 'This package enables integration with\n foo servers.' != 'This package enables integration with\n foo servers.'
Hmm. RFC 822 (rfc822) and RFC 2822 have different "unfolding" rules for multiple line header. Curiously http://docs.python.org/library/rfc822.html is advertised as "Parse RFC 2822 mail headers," and yet it seems to unfold multiple lines in accordance with RFC 822. I don't know what the solution to this problem is. I see that lines in the `Description` field (in PKG-INFO) are preceded with 8 spaces. So PKG-INFO is generated in accordance with RFC 822. Is there a way to parse a RFC 822 message in Python 3? -srid
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote:
On May 27, 2010, at 10:25 AM, Sridhar Ratnakumar wrote:
Is there a way to parse a RFC 822 message in Python 3?
If it's ASCII, you should have no problems using email.parser.Parser.
The issue is that its behavior is subtly different from the now-removed rfc822 parser:: $ /opt/Python-2.6.5/bin/python Python 2.6.5 (r265:79063, Apr 6 2010, 14:45:18) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from StringIO import StringIO with_multiline = StringIO("""\ ... Description: this is a multiline RFC 822 ... header.""") from rfc822 import Message rfc_msg = Message(with_multiline) with_multiline.seek(0) from email.parser import Parser email_msg = Parser().parse(with_multiline) rfc_msg.getheader('Description') 'this is a multiline RFC 822\n header.' email_msg.get('Description') 'this is a multiline RFC 822\n header.'
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkv/FWsACgkQ+gerLs4ltQ67SwCeIWLyFj2c7rLb/cpcoxZ4sUzF eHYAoInVb2cDZsIwOB0loSkZ3d9gAiDi =Q1BW -----END PGP SIGNATURE-----
On May 27, 2010, at 08:59 PM, Tres Seaver wrote:
Barry Warsaw wrote:
On May 27, 2010, at 10:25 AM, Sridhar Ratnakumar wrote:
Is there a way to parse a RFC 822 message in Python 3?
If it's ASCII, you should have no problems using email.parser.Parser.
The issue is that its behavior is subtly different from the now-removed rfc822 parser::
$ /opt/Python-2.6.5/bin/python Python 2.6.5 (r265:79063, Apr 6 2010, 14:45:18) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from StringIO import StringIO with_multiline = StringIO("""\ ... Description: this is a multiline RFC 822 ... header.""") from rfc822 import Message rfc_msg = Message(with_multiline) with_multiline.seek(0) from email.parser import Parser email_msg = Parser().parse(with_multiline) rfc_msg.getheader('Description') 'this is a multiline RFC 822\n header.' email_msg.get('Description') 'this is a multiline RFC 822\n header.'
If I'm reading this correctly, the "problem" is that rfc822 collapses continuation whitespace and email.parser preserves it? Isn't the email package (more) correct, and what specific problem does that cause? Or is it just that it's different so tools have to catch up to that? -Barry
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote:
On May 27, 2010, at 08:59 PM, Tres Seaver wrote:
Barry Warsaw wrote:
On May 27, 2010, at 10:25 AM, Sridhar Ratnakumar wrote:
Is there a way to parse a RFC 822 message in Python 3? If it's ASCII, you should have no problems using email.parser.Parser. The issue is that its behavior is subtly different from the now-removed rfc822 parser::
$ /opt/Python-2.6.5/bin/python Python 2.6.5 (r265:79063, Apr 6 2010, 14:45:18) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from StringIO import StringIO with_multiline = StringIO("""\ ... Description: this is a multiline RFC 822 ... header.""") from rfc822 import Message rfc_msg = Message(with_multiline) with_multiline.seek(0) from email.parser import Parser email_msg = Parser().parse(with_multiline) rfc_msg.getheader('Description') 'this is a multiline RFC 822\n header.' email_msg.get('Description') 'this is a multiline RFC 822\n header.'
If I'm reading this correctly, the "problem" is that rfc822 collapses continuation whitespace and email.parser preserves it? Isn't the email package (more) correct,
"More correct" is debateable: The email.parser module does not remove the newline, for instance, which is what RFC2822 suggests for "unfolding" header lines: http://www.faqs.org/rfcs/rfc2822.html Collapsing extra leading whitespace in header continuation lines seems like a reasonable strategy: lines created by "folding" per RFC (2)822 won't normally have them, while those which do (e.g, as created by distutils, or perhaps by hand) do, but they aren't meaningful.
and what specific problem does that cause? Or is it just that it's different so tools have to catch up to that?
In particular, pkginfo wants to run across a wide range of Python versions, with Python 2.4 still actively supported. I therefore need to fall back to the rfc822 module when the newer module is not present. I have chosen for the moment to enforce the collapsing where email.parser is used. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwAA1YACgkQ+gerLs4ltQ6ZRwCeOuwr3bq/h6BJqWWNfUB+qygB /fQAoI5daS3qA/yYiEF6s+PsDaGPcxOn =KN6V -----END PGP SIGNATURE-----
On May 28, 2010, at 01:54 PM, Tres Seaver wrote:
"More correct" is debateable: The email.parser module does not remove the newline, for instance, which is what RFC2822 suggests for "unfolding" header lines:
http://www.faqs.org/rfcs/rfc2822.html
Collapsing extra leading whitespace in header continuation lines seems like a reasonable strategy: lines created by "folding" per RFC (2)822 won't normally have them, while those which do (e.g, as created by distutils, or perhaps by hand) do, but they aren't meaningful.
Right. Over in email-sig land we're talking about how to access both the raw header (i.e. what you parsed) and the intended semantic header which would be the unfolded value. No need to re-hash that here in distutils-sig.
and what specific problem does that cause? Or is it just that it's different so tools have to catch up to that?
In particular, pkginfo wants to run across a wide range of Python versions, with Python 2.4 still actively supported. I therefore need to fall back to the rfc822 module when the newer module is not present. I have chosen for the moment to enforce the collapsing where email.parser is used.
Probably best to manually unfold the header value regardless of where you got it from. The email package should (and hopefully someday will) make this easier. -Barry
participants (3)
-
Barry Warsaw
-
Sridhar Ratnakumar
-
Tres Seaver