event-based rfc822.py
Neale Pickett
neale at woozle.org
Thu Aug 15 23:59:16 EDT 2002
At work, we are using Python to write Internet proxies. We have our own
sockets interface which is even-driven, unlike the traditional Berkely
sockets interface. So where in BSD you would go:
data = fd.read(8192)
# Do something with data
in ours, you have to set up a function to handle read events:
def handle_read(self, data):
# Do something with data
I'd like to be able to use all of the super-keen pythonic modules for
picking apart, say, rfc822 messages. But those are all written for a
Berkeley sockets model (pull), not our event model (push).
Generators to the rescue!
This modification to rfc822.py from the Python 2.2.1 distribution (diff
below) makes a few slight changes to let it work with an event-based
system. But you can still use it with a traditional file descriptor,
too. In fact, the published API still works exactly as written.
What's new is that you can now pass in None for the fd, and use an
eatline(line) method to send data line by line to the class. When it's
done, you get a StopIteration exception. So the example at the bottom
of rfc822.py could be written like this:
#! /usr/bin/python2.2
import rfc822
if __name__ == '__main__':
import sys, os
file = os.path.join(os.environ['HOME'], 'Mail/inbox/1')
if sys.argv[1:]: file = sys.argv[1]
f = open(file, 'r')
m = rfc822.Message()
while 1:
line = f.readline()
try:
m.eatline(line)
except StopIteration:
break
print 'From:', m.getaddr('from')
print 'To:', m.getaddrlist('to')
print 'Subject:', m.getheader('subject')
print 'Date:', m.getheader('date')
So my question to the Python community is twofold:
1. Is this just a horrific abuse of generators? Is there a better way
to do it without significantly altering the code?
2. What thoughts do people have about this sort of modification to the
email classes? It looks like it may be a bit more challenging, but
not overly so. Is it worth doing, or is there a better solution?
3. Assuming this isn't too horrifying and is worth doing, is there any
chance of this sneaking into CVS for the next release of Python (in
which yield will be a keyword by default)?
Oops, that was three.
Here's the diff:
---8<---
--- /usr/lib/python2.2/rfc822.py Sat Apr 20 23:41:43 2002
+++ rfc822.py Thu Aug 15 20:56:12 2002
@@ -70,7 +70,9 @@
There are also some utility functions here.
"""
# Cleanup and extensions by Eric S. Raymond <esr at thyrsus.com>
+# Generatorifiation by Neale Pickett <neale at woozle.org>
+from __future__ import generators
import time
__all__ = ["Message","AddressList","parsedate","parsedate_tz","mktime_tz"]
@@ -81,7 +83,7 @@
class Message:
"""Represents a single RFC 2822-compliant message."""
- def __init__(self, fp, seekable = 1):
+ def __init__(self, fp = None, seekable = 1):
"""Initialize the class instance and read the headers."""
if seekable == 1:
# Exercise tell() to make sure it works
@@ -103,7 +105,9 @@
except IOError:
self.seekable = 0
#
- self.readheaders()
+ self.linechewer = self.readheaderlines()
+ if fp:
+ self.readheaders()
#
if self.seekable:
try:
@@ -132,6 +136,16 @@
printing them will reproduce the header exactly as it appears in the
file).
"""
+ while 1:
+ line = self.fp.readline()
+ if not line:
+ break
+ try:
+ self.eatline(line)
+ except StopIteration:
+ break
+
+ def readheaderlines(self):
self.dict = {}
self.unixfrom = ''
self.headers = list = []
@@ -144,19 +158,21 @@
elif self.seekable:
tell = self.fp.tell
while 1:
+ if not firstline: yield None
if tell:
try:
startofline = tell()
except IOError:
startofline = tell = None
self.seekable = 0
- line = self.fp.readline()
+ line = self._line
if not line:
self.status = 'EOF in headers'
break
# Skip unix From name time lines
if firstline and line.startswith('From '):
self.unixfrom = self.unixfrom + line
+ yield None
continue
firstline = 0
if headerseen and line[0] in ' \t':
@@ -191,6 +207,10 @@
else:
self.status = self.status + '; bad seek'
break
+
+ def eatline(self, line):
+ self._line = line
+ self.linechewer.next()
def isheader(self, line):
"""Determine whether a given line is a legal header.
---8<---
Thanks
Neale
More information about the Python-list
mailing list