[Email-SIG] API for email threading library?

Bill Janssen janssen at parc.com
Thu Jan 5 18:55:41 CET 2012


Folks, I'm working on an implementation of RFC 5256 email threading,
designed so that it could fit as a submodule in the "email" package, if
such a think was ever seen to be useful.

I'd like to ask "the wisdom of the crowd" what they think an appropriate
interface to such a thing would be?  The basic operation is that you
create a collection (type C) of email threads (type T) by passing a set
of messages (type M) to the constructor.

* Should M be required to be "email.message.Message", or perhaps some
  less restrictive type, say "ThreadableMessageAPI"?  All that's
  strictly required is the ability to retrieve the Message-ID, Subject,
  Date, References, and In-Reply-To fields.

* What operations should be possible on C?  Some that come to mind:

  * retrieve_thread (M or message-id) => T
  * add_message (M) => T
  * add_messages (set of M) => None
  * remove_message (M or message-id) => T (or None) ?

* What's the interface for T?  It's a tree with possible dummy nodes, so
  a tuple of messages plus nested tuples would do it.  What should the
  nodes in the tree be?  Normalized (see RFC 5256) Message-IDs?
  email.message.Message instances?

* For large sets of threads (millions of messages) a persistence
  mechanism would be useful.  Should there be a standard interface to
  such a mechanism, perhaps as class methods on C?  If so, what should
  it look like?  Should the implementation contain a default persistent
  subclass of C, based on sqlite3?  What side-effects would persistence
  requirements have on the other design considerations?  For instance,
  would you have to save the entire text of a message for each node?
  Just the headers?  Just some of the headers?  Just the Message-ID?

Have at it!  Advise away!

Bill


More information about the Email-SIG mailing list