[Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]

Thu Sep 16 17:30:12 CEST 2010

On Thu, 16 Sep 2010 09:52:48 -0400, Barry Warsaw <barry at python.org> wrote:
> On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:
> >There are some APIs that should be able to handle bytes *or* strings,
> >but the current use of string literals in their implementation means
> >that bytes don't work. This turns out to be a PITA for some networking
> >related code which really wants to be working with raw bytes (e.g.
> >URLs coming off the wire).
> 
> Note that email has exactly the same problem.  A general solution -- even if
> embodied in *well documented* best-practices and convention -- would really
> help make the stdlib work consistently, and I bet third party libraries too.

Allowing bytes-in -> bytes-out where possible would definitely be a help
(and Guido has endorsed this, IIUC), but some care has to be taken to
understand the API contract of the method in question before blindly
applying it.  Are you "merely" allowing bytes to be processed as ASCII
strings, or does processing the bytes *correctly* imply that you are
converting from an ASCII encoding of text in order to process it?
In Python2, the latter might not generate unicode yet still produce
a correct result most of the time, but a big point of Python3 is to
eliminate that "most of the time", so we need to be careful not to
reintroduce it.  This was all covered in the thread Nick refers to;
I just want to emphasize that one needs to look at the API contract
carefully before making it polymorphic (in Guido's sense of the term).

If the way to do this is well documented best practices, we first
have to figure out what those best practices are.   To do that we have
to write some real-world code.  I'm trying one approach in email6:
Bytes and String subclasses, where the subclasses have an attribute
named 'literals' derived from a utility module that does this:

    literals = dict(
        empty = '',
        colon = ':',
        newline = '\n',
        space = ' ',
        tab = '\t',
        fws = ' \t',
        headersep = ': ',
        )

    class _string_literals:
        pass
    class _bytes_literals:
        pass

    for name, value in literals.items():
        setattr(_string_literals, name, value)
        setattr(_bytes_literals, name, bytes(value, 'ASCII'))
    del literals, name, value

And the subclasses do:

    class BytesHeader(BaseHeader):
        lit = email.utils._bytes_literals

    class StringHeader(BaseHeader):
        lit = email.utils._string_literals

And then BaseHeader uses self.lit.colon, etc, when manipulating strings.
It also has to use slice notation rather than indexing when looking at
individual characters, which is a PITA but not terrible.

I'm not saying this is the best approach, since this is all experimental
code at the moment, but it is *an* approach....

--
R. David Murray                                      www.bitdance.com