event-based rfc822.py

Neale Pickett neale at woozle.org
Fri Aug 16 05:59:16 CEST 2002


At work, we are using Python to write Internet proxies.  We have our own
sockets interface which is even-driven, unlike the traditional Berkely
sockets interface.  So where in BSD you would go:

    data = fd.read(8192)
    # Do something with data

in ours, you have to set up a function to handle read events:

        def handle_read(self, data):
            # Do something with data

I'd like to be able to use all of the super-keen pythonic modules for
picking apart, say, rfc822 messages.  But those are all written for a
Berkeley sockets model (pull), not our event model (push).

Generators to the rescue!

This modification to rfc822.py from the Python 2.2.1 distribution (diff
below) makes a few slight changes to let it work with an event-based
system.  But you can still use it with a traditional file descriptor,
too.  In fact, the published API still works exactly as written.

What's new is that you can now pass in None for the fd, and use an
eatline(line) method to send data line by line to the class.  When it's
done, you get a StopIteration exception.  So the example at the bottom
of rfc822.py could be written like this:


#! /usr/bin/python2.2

import rfc822

if __name__ == '__main__':
    import sys, os
    file = os.path.join(os.environ['HOME'], 'Mail/inbox/1')
    if sys.argv[1:]: file = sys.argv[1]
    f = open(file, 'r')
    m = rfc822.Message()
    while 1:
        line = f.readline()
        try:
            m.eatline(line)
        except StopIteration:
            break
    print 'From:', m.getaddr('from')
    print 'To:', m.getaddrlist('to')
    print 'Subject:', m.getheader('subject')
    print 'Date:', m.getheader('date')


So my question to the Python community is twofold:

1. Is this just a horrific abuse of generators?  Is there a better way
   to do it without significantly altering the code?

2. What thoughts do people have about this sort of modification to the
   email classes?  It looks like it may be a bit more challenging, but
   not overly so.  Is it worth doing, or is there a better solution?

3. Assuming this isn't too horrifying and is worth doing, is there any
   chance of this sneaking into CVS for the next release of Python (in
   which yield will be a keyword by default)?

Oops, that was three.

Here's the diff:

---8<---
--- /usr/lib/python2.2/rfc822.py	Sat Apr 20 23:41:43 2002
+++ rfc822.py	Thu Aug 15 20:56:12 2002
@@ -70,7 +70,9 @@
 There are also some utility functions here.
 """
 # Cleanup and extensions by Eric S. Raymond <esr at thyrsus.com>
+# Generatorifiation by Neale Pickett <neale at woozle.org>
 
+from __future__ import generators
 import time
 
 __all__ = ["Message","AddressList","parsedate","parsedate_tz","mktime_tz"]
@@ -81,7 +83,7 @@
 class Message:
     """Represents a single RFC 2822-compliant message."""
 
-    def __init__(self, fp, seekable = 1):
+    def __init__(self, fp = None, seekable = 1):
         """Initialize the class instance and read the headers."""
         if seekable == 1:
             # Exercise tell() to make sure it works
@@ -103,7 +105,9 @@
             except IOError:
                 self.seekable = 0
         #
-        self.readheaders()
+        self.linechewer = self.readheaderlines()
+        if fp:
+            self.readheaders()
         #
         if self.seekable:
             try:
@@ -132,6 +136,16 @@
         printing them will reproduce the header exactly as it appears in the
         file).
         """
+        while 1:
+            line = self.fp.readline()
+            if not line:
+                break
+            try:
+                self.eatline(line)
+            except StopIteration:
+                break
+
+    def readheaderlines(self):
         self.dict = {}
         self.unixfrom = ''
         self.headers = list = []
@@ -144,19 +158,21 @@
         elif self.seekable:
             tell = self.fp.tell
         while 1:
+            if not firstline: yield None
             if tell:
                 try:
                     startofline = tell()
                 except IOError:
                     startofline = tell = None
                     self.seekable = 0
-            line = self.fp.readline()
+            line = self._line
             if not line:
                 self.status = 'EOF in headers'
                 break
             # Skip unix From name time lines
             if firstline and line.startswith('From '):
                 self.unixfrom = self.unixfrom + line
+                yield None
                 continue
             firstline = 0
             if headerseen and line[0] in ' \t':
@@ -191,6 +207,10 @@
                 else:
                     self.status = self.status + '; bad seek'
                 break
+
+    def eatline(self, line):
+        self._line = line
+        self.linechewer.next()
 
     def isheader(self, line):
         """Determine whether a given line is a legal header.
---8<---

Thanks

Neale




More information about the Python-list mailing list