[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

Fri Sep 17 00:11:30 CEST 2010

On Sep 16, 2010, at 4:51 PM, R. David Murray wrote:

> Given a message, there are many times you want to serialize it as text
> (for example, for presentation in a UI).  You could provide alternate
> serialization methods to get text out on demand....but then what if
> someone wants to push that text representation back in to email to
> rebuild a model of the message?

You tell them "too bad, make some bytes out of that text."  Leave it up to the application.  Period, the end, it's not the library's job.  If you pushed the text out to a 'view message source' UI representation, then the vicissitudes of the system clipboard and other encoding and decoding things may corrupt it in inscrutable ways.  You can't fix it.  Don't try.

> So now we have both a bytes parser and a string parser.

Why do so many messages on this subject take this for granted?  It's wrong for the email module just like it's wrong for every other package.

There are plenty of other (better) ways to deal with this problem.  Let the application decide how to fudge the encoding of the characters back into bytes that can be parsed.  "In the face of ambiguity, refuse the temptation to guess" and all that.  The application has more of an idea of what's going on than the library here, so let it make encoding decisions.

Put another way, there's nothing wrong with having a text parser, as long as it just encodes the text according to some known encoding and then parses the bytes :).

> So, after much discussion, what we arrived at (so far!) is a model
> that mimics the Python3 split between bytes and strings.  If you
> start with bytes input, you end up with a BytesMessage object.
> If you start with string input to the parser, you end up with a
> StringMessage.

That may be a handy way to deal with some grotty internal implementation details, but having a 'decode()' method is broken.  The thing I care about, as a consumer of this API, is that there is a clearly defined "Message" interface, which gives me a uniform-looking place where I can ask for either characters (if I'm displaying them to the user) or bytes (if I'm putting them on the wire).  I don't particularly care where those bytes came from.  I don't care what decoding tricks were necessary to produce the characters.

Now, it may be worthwhile to have specific normalization / debrokenifying methods which deal with specific types of corrupt data from the wire; encoding-guessing, replacement-character insertion or whatever else are fine things to try.  It may also be helpful to keep around a list of errors in the message, for inspection.  But as we know, there are lots of ways that MIME data can go bad other than encoding, so that's just one variety of error that we might want to keep around.

(Looking at later messages as I'm about to post this, I think this all sounds pretty similar to Antoine's suggestions, with respect to keeping the implementation within a single class, and not having BytesMessage/UnicodeMessage at the same abstraction level.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100916/712651ea/attachment-0001.html>