[Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

Wed Sep 22 12:48:21 CEST 2010

On Wed, Sep 22, 2010 at 9:37 AM, Andrew McNamara
<andrewm at object-craft.com.au> wrote:
>>Yeah, that's the original reasoning that had me leaning towards the
>>parallel API approach. If I seem to be changing my mind a lot in this
>>thread it's because I'm genuinely torn between the desire to make it
>>easier to port existing 2.x code to 3.x by making the current API
>>polymorphic and the fear that doing so will reintroduce some of the
>>exact same bytes/text confusion that the bytes/str split is trying to
>>get rid of.
>
> I don't think polymorphic API's do anyone any favours in the long
> run. My experience of the Py2 email API was that it would give the
> developer false comfort, only to blow up when the app was in the hands
> of users, and it didn't seem to matter how careful I was. Py3 has gone
> the pure/strict route in the core, and I think libs should be consistent
> with that choice.  Developers will have work a little harder, but there
> will be less surprises.

There's an important distinction here though. Either change I could
make to urllib.parse will still result in two distinct APIs. The only
question is whether the new bytes->bytes APIs need to have a different
spelling or not.

Python 2.x is close to impossible to reliably test in this area
because there's no programmatic way to tell the difference between
encoded bytes and decoded text. In Python 3, while you can still get
yourself in trouble by mixing encodings at the bytes level, you're
almost never going to mistake bytes for text unless you go out of your
way to support working that way.

The structure of quote/unquote (which already contain implicit
decode/encode steps to allow them to consume both bytes and strings
with relative abandon and have done since 3.0) may cause us problems
in the long run, but polymorphic APIs where the type of the input is
the same as the type of the output shouldn't be any more dangerous
than if those same APIs used a different spelling to operate on bytes.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia