[Python-Dev] PEP 399: Pure Python/C Accelerator Module Compatibiilty Requirements

Wed Apr 6 19:39:09 CEST 2011

On Tue, Apr 5, 2011 at 12:57, Raymond Hettinger <raymond.hettinger at gmail.com
> wrote:

> [Brett]
> > This PEP requires that in these instances that both
> > the Python and C code must be semantically identical
>
> Are you talking about the guaranteed semantics
> promised by the docs or are you talking about
> every possible implementation detail?
>
> ISTM that even with pure python code, we get problems
> with people relying on implementation specific details.
>
> * Two functions accept a sequence, but one accesses
>  it using __len__ and __getitem__ while the other
>  uses __iter__.   (This is like the Spam example
>  in the PEP).
>

That's a consistency problem in all of our C code and not unique to Python/C
modules.

>
> * Given pure python library code like:
>       if x < y: ...
>  I've seen people only implement __lt__
>  but not __gt__, making it impossible to
>  make even minor adjustments to the code such as:
>       if y > x:  ...
>

How is that an issue here? Because someone was lazy in the C code but not
the Python code? That is an issue as that is a difference in what methods
are provided.

>
> * We also suffer from inconsistency in choice of
>  exceptions (i.e. overly large sequence indices
>  raising either an IndexError, OverflowError, or
>  ValueError).
>

Once again, a general issue in our C code and not special to this PEP.

>
> With C code, I wonder if certain implementation
> differences go with the territory:
>
> * Concurrency issues are a common semantic difference.
>  For example, deque.pop() is atomic because the C
>  code holds the GIL but a pure python equivalent
>  would have to use locks to achieve same effect
>  (and even then might introduce liveness or deadlock
>  issues).
>

That's just a CPython-specific issue that will always be tough to work
around. Obviously we can do the best we can but since the other VMs don't
necessarily have the same concurrency guarantees per Python expression it is
near impossible to define.

>
> * Heapq is one of the rare examples of purely
>  algorithmic code.  Much of the code in CPython
>  does accesses libraries (i.e. the math module),
>  interfaces with the OS, access binary data
>  structures, links to third-party tools (sqlite3
>  and Tkinter) or does something else that doesn't
>  have pure python equivalents (at least without
>  using C types).
>

Those C modules are outside the scope of the PEP.

>
> * The C API for parsing argument tuples and keywords
>  do not readily parallel the way the same are
>  written in Python.  And with iterators, the argument
>  checking in the C versions tends to happen when the
>  iterator is instantiated, but code written with
>  pure python generators doesn't have its setup and
>  checking section run until next() is called the
>  first time.
>
> * We've had a very difficult time bridging the gulf
>  between python's infinite precision numbers and
>  and C's fixed width numbers (for example, it took
>  years to get range() to handle values greater than
>  a word size).
>

I don't expect that to be an issue as this is a limitation in CPython that
the other VMs never run into. If anything it is puts the other VMs at an
advantage for us relying on C code.

>
> * C code tends to be written in a way that takes
>  advantage of that language's features instead of
>  in a form that is a direct translation of pure
>  python.  For example, I think the work being done
>  on a C implementation of decimal has vastly different
>  internal structures and it would be a huge challenge
>  to make it semantically identical to the pure python
>  version with respect to its implementation details.
>  Likewise, a worthwhile C implementation of OrderedDict
>  can only achieve massive space savings by having
>  majorly different implementation details.
>
> Instead of expressing the wishful thought that C
> versions and pure Python versions are semantically
> identical with respect to implementation details,
> I would like to see more thought put into specific
> limitations on C coding techniques and general
> agreement on which implementation specific details
> should be guaranteed:
>
> * I would like to see a restriction on the use of
>  the concrete C API such that it is *only* used
>  when a exact type match has been found or created
>  (i.e. if someone writes Py_ListNew(), then it
>  is okay to use Py_ListSetItem()).  See
>  http://bugs.python.org/issue10977 for a discussion
>  of what can go wrong.  The original json C
>  was an example of code that used the concrete
>  C API is a way that precluded pure python
>  subclasses of list and dict.
>

That's a general coding policy that is not special to this PEP.

>
> * I would like to see better consistency on when to
>  use OverflowError vs ValueError vs IndexError.
>
>
Once again, not specific to this PEP.

> * There should also be a discussion of whether the
>  possible exceptions should be a guaranteed part
>  of the API as it is in Java.  Because there were
>  no guarantees (i.e. ord(x) can raise this, that,
>  and the other), people tend to run an experiment
>  and then rely on whatever C Python happens to do.
>
>
Still not part of this PEP and I am going to stop saying this. =)

> * There should be a discussion on when it is okay
>  for a C implementation to handle only a value
>  range that fits in a word.
>
> * When there is C code, when is it okay for a user
>  to assume atomic access?  Even with pure python
>  code, we're not always consistent about it
>  (i.e. OrderedDict implementation is not threadsafe
>  but the LRU_Cache is).
>
> * There should be some agreement that people
>  implementing rich comparisons will implement
>  all six operations so that client code doesn't
>  become dependent on (x<y versus y>x).  For
>  example, we had to add special-case logic to
>  heapq years ago because Twisted implemented
>  a task object that defined __le__ instead of
>  __lt__, so it was usable only with an older
>  version of heapq but not with min, sort, etc.
>
> A good PEP should address these issues head-on.
> Just saying that C and python code have to
> be semantically identical in all implementation
> details doesn't really address the issue.
>
>
> [Brett]
> > (sorry, Raymond, for picking on heapq, but is
> > was what bit the PyPy people most recently =).
>
> No worries, it wasn't even my code.  Someone
> donated it.  The was a discusion on python-dev
> and collective agreement to allow it to have
> semantic differences that would let it run faster.
> IIRC, the final call was made by Uncle Timmy.
>
> That being said, I would like to see a broader set
> of examples rather rather than extrapolating from
> a single piece 7+ year-old code.  It is purely
> algorithmic, so it really just represents the
> simplest case.  It would be much more interesting
> to discuss something what should be done with
> future C implementations for threading, decimal,
> OrderedDict, or some existing non-trivial C
> accelerators like that for JSON or XML.
>

This is a known issue and is a priori something that needs to be worked out.
If one of the other VM teams want to dig up some more examples they can, but
I'm not going to put them through that for something that is so obviously
something we want written down in a PEP.

>
> Brett, thanks for bringing the issue up.
> I've been bugged for a good while about
> issues like overbroad use of the concrete C API.
>

Since people are taking my "semantically identical" point too strongly for
what I mean (there is a reason I said "except in cases
where implementation details of a VM prevents [semantic equivalency]
entirely"), how about we change the requirement that C acceleration code
must pass the same test suite (sans C specific issues such as refcount tests
or word size) and adhere to the documented semantics the same? It should get
us the same result without ruffling so many feathers. And if the other VMs
find an inconsistency they can add a proper test and then we fix the code
(as would be the case regardless). And in instances where it is simply not
possible because of C limitations the test won't get written since the test
will never pass.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110406/2178cbfb/attachment.html>