[Twisted-Python] New email FeedParser

At Pycon, I was talking with Glyph and others about the email parser in Python 2.3. Anthony Baxter, Thomas Wouters and I were having a little email-sig sprint, and we all agreed about the major problems with the current email parser. - It can throw exceptions parsing some messages. These exceptions can be difficult to handle. - You must slurp the entire message into memory before you can start parsing it. Over in the email-sig we've been talking and working on a new parser, called the FeedParser which eliminates both of these problems. This parser also has the advantage of being much more RFC compliant, IMO <2046 wink>. In fact, we now have a new FeedParser.py in Python 2.4cvs (slated to be email 3.0) which I think does a very good job of parsing all manner of valid and invalid emails. The old email.Parser.Parser interface continues to exist for backward compatibility. The docs have not been updated yet, but the unit tests have. Note that the FeedParser, if it encounters broken MIME, will add 'defects' to a message object and continue on as best it can. You can check the message's .defects attribute; if it exists it will be a list of instances providing more information about what type(s) of defects were encounter. To use it, you instantiate an email.FeedParser.FeedParser and continually call its .feed() method, which takes a single argument of arbitrary length string data. The data need not be a complete line, although the FeedParser will split it into lines (using any of the three common line endings), gulping input a line at a time. Internally, the parsing routines are generators that yield when they need more data (feed() itself just returns). When you've feed it all the data there's ever going to be, you call .close() on the parser; the rest of the data is consumed and you get back the root email object. Because I think we're largely done with the FeedParser[1], and because some of the Twisted guys were interested in this stuff, I'm sending this message so you can grab the new parser and see if it's going to fit the bill. For now, you'll have to get it out of Python's cvs, but at some point when we've addressed the other issues in the email package, we'll make a distutils release. Note that email 3.0 will be compatible with Python 2.3 but nothing earlier. Please follow up with any discussions to email-sig@python.org. Enjoy, -Barry [1] Although see these messages for open issues: http://mail.python.org/pipermail/email-sig/2004-May/000114.html http://mail.python.org/pipermail/email-sig/2004-May/000118.html
participants (1)
-
Barry Warsaw