[Python-Dev] Proposal: explicitly disallow function/class mismatches in accelerator modules

Sun Jul 10 20:34:36 EDT 2016

Hello all, and thanks Nick for starting the discussion!

Long wall of text ahead, whoops! TL;DR - everyone seems to agree, let's do
it.

I think the main issue that we're hitting is that we (whatever you want "we"
to mean) prefer to make Python code in the standard library as easily
understandable and readable as possible (a point Raymond raised in the
issue, which could be a discussion on its own too, but I won't get into
that). People new to Python will sometimes look into the code to try and
understand how it works, while only contributors and core devs (read: people
who know C) will look at the C code, so keeping the code simple isn't as
baked in the design as the Python versions.

As such, making closures might be TOOWTDI in Python, but it can quickly
become annoying to reimplement in C - either you hide away some of the
implementation details behind C level variables and pretend like you're a
closure, or you change the Python version to something else. It's much less
consequential to change a closure into a class than it is to change a class
into a function. In this particular case, we lose the descriptorness(?) of
the Python version, but I'd rather think of this as fixing a bug rather than
removing a feature, especially when partialmethod literally lies right below
in the source. (Side-note: I noticed the source says "Purely functional, no
descriptor behaviour" but functions exhibit descriptor behaviour)

I think the change is worth it (well, there'd be a problem if I didn't ;),
but I'm much more concerned about ensuring that:

- Someone at some point finding a bunch of bugs^Wdiscrepancies between the
Python and C versions of a feature to have some concise rules on the changes
they can and cannot make;
- Python implementation of existing C features, of C implementations of
existing Python features to know exactly the liberty they can and cannot
take;
- New features implemented both in C and Python, to know offhand their
limits and make sure someone further down the line doesn't have to fix it
when they realize e.g. PyPy behaves differently.

> On Saturday, July 09, 2016 8:16 PM, Nick Coghlan wrote:
> 
> That's the proposed policy change and the reason I figured it needed a
> python-dev discussion, as currently it's up to folks adding the Python
> equivalents (or the C accelerators) to decide on a case by case basis
> whether or not to care about compatibility for:
> 
> - string representations
> - pickling (and hence multiprocessing support)
> - subclassing
> - isinstance checks
> - descriptor behaviour

That's quite an exhaustive list for "let the person making the patch decide
what to do with that;" quite the more reason to make this concise (also see
my reply to Brett below).

> The main way for such discrepancies to arise is for the Python
> implementation to be a function (or closure), while the C
> implementation is a custom stateful callable.

Maybe closures are "too complicated" to be a proper design if something is
to be written in both Python and C ;)

> The problem with the current "those are just technical implementation
> details" approach is that a lot of Pythonistas learn standard library
> API behaviour and capabilities through a mix of experimentation and
> introspection rather than reading the documentation, so if CPython
> uses the accelerated version by default, then most folks aren't going
> to notice the discrepancies until they (or their users) are trying to
> debug a problem like "my library works fine in CPython, but breaks
> when used with multiprocessing on PyPy" or "my doctests fail when
> running under MicroPython".

I noticed that Python's standard library takes a very "duck-typed" approach
to the accelerator modules: "I have this thing which is a function, and I
[expose it as a global/use it privately], but before I do so, let's see if
there's something with the same name in that other module, then use it." In
practice, this doesn't create much of an issue, except this thread exists.
(I'm not proposing to change how accelerator modules are handled, merely
pointing out that making designs identical was never a hard requirement and
depended on the developer(s))

> One example of a practical consequence of the change in policy would
> be to say that if you don't want to support subclassing, then don't
> give the *type* a public name - hide it behind a factory function [...]

It's a mixed bag though. How do you disallow subclassing but still allow
isinstance() checks? Now let's try it in Python and without metaclasses, and
the documented vs undocumented (and unguaranteed) API differences become
much more important. But you have to be a consenting adult if you're working
your way around the rules, so there's that I guess.

> On Saturday, July 09, 2016 9:16 PM, Brett Cannon wrote:
> I think flat-out prohibiting won't work in the Python -> C case as you can
do things such as closures and such that I don't know if we provide the APIs
to mimic through the C API. I'm fine saying we "strongly encourage mirroring
the design between the pure Python and accelerated version for various
reasons".

I think these reasons should probably be explained, if we're to set for a
"don't enforce it, just strongly encourage it" wording. I'd rather go for a
more aggressive, "exactly mirror the design if possible, and if not suggest
changing the design" way, though. To me, "we strongly encourage it" seems
like it could be too lax and then years later another similar patch surfaces
(which properly fixes it), and it's this discussion all over again - because
"strongly encourage" is up to the reviewers' discretion, and that's pretty
much the status quo in my eyes.

-Emanuel