On 2018-06-21 11:22, Victor Stinner wrote:
> CCALL_VARARGS: cc_func(PyObject *self, PyObject *args)
> If we add a new calling convention
This is not a *new* calling convention, it's the *existing* calling
convention for METH_VARARGS. Obviously, we need to continue to support that.
On 2018-06-20 17:42, INADA Naoki wrote:
> I don't have any idea about changing METH_FASTCALL more.
> If Victor and Serhiy think so, and PyPy maintainers like it too, I want
> to make it public
> as soon as possible.
There are two different things here:
The first is documenting METH_FASTCALL such that everybody can create
built-in functions using the METH_FASTCALL signature. I think that the
API for METH_FASTCALL (without or with METH_KEYWORDS) is fine, so I
support making it public. This is really just a documentation issue, so
I see no reason why it couldn't be added to 3.7.0 if we're fast.
The API for calling functions using the FASTCALL convention is more of a
mess though. There are functions taking keyword arguments as dict and
functions taking them as tuple. As I mentioned in PEP 580, I'd like to
merge these and simply allow either a dict or a tuple. Since this would
require an API change, this won't be for 3.7.0.
First of all, thank you Jeroen for writing nice PEPs.
When I read PEP 579, I think "6. METH_FASTCALL is private and undocumented"
should be solved first.
I don't have any idea about changing METH_FASTCALL more.
If Victor and Serhiy think so, and PyPy maintainers like it too, I want to
make it public
as soon as possible.
_PyObject_FastCall* APIs are private in Python 3.7.
But METH_FASTCALL is not completely private (start without underscore,
but not documented)
Can we call it as public, stable by adding document, if Ned allows?
It's used widely in Python internals already. I suppose that making it
doesn't make Python 3.7 unstable much.
If we can't at Python 3.7, I think we should do it at 3.8.
INADA Naoki <songofacandy(a)gmail.com>
On 2018-06-20 16:42, Antoine Pitrou wrote:
> I'm wondering what amount of code and debugging is needed for, say,
> Cython or Numba to implement that protocol as a caller, without going
> through the C API's indirections (for performance).
The goal is to have a really fast C API without a lot of indirections.
If Cython or Numba can implement the protocol faster than CPython, we
should just change the CPython implementation to be equally fast.
On 2018-06-20 16:09, Antoine Pitrou wrote:
> But there seems to be some complication on top of that:
> - PyCCall_FastCall() accepts several types for the keywords, even a
That is actually a *simplification* instead of a *complication*.
Currently, there is a huge amount of code duplication between
_PyMethodDef_RawFastCallKeywords and _PyMethodDef_RawFastCallDict.
Folding both of these in one function actually makes things simpler.
> does it get forwarded as-is to the `cc_func` or is it first
Transformed (obviously, otherwise it would be a huge backwards
> - there's CCALL_OBJCLASS and CCALL_SLICE_SELF which have, well,
> non-obvious behaviour (especially the latter), especially as it is
> conditioned on the value of other fields or flags
It's actually quite obvious when you think of it: both are needed to
support existing use cases. Perhaps it's just not explained well enough
in the PEP.
> I wonder if there's a way to push some of the specificities out of the
> protocol and into the C API that mediates between the protocol and
> actual callers?
Sorry, I have no idea what you mean here. Actually, those flags are
handled by the C API. The actual C functions don't need to care about
On 2018-06-18 16:55, INADA Naoki wrote:
> Speeding up most python function and some bultin functions was very
> But I doubt making some 3rd party call 20% faster can make real
> applications significant faster.
These two sentences are almost contradictory. I find it strange to claim
that a given optimization was "very significant" in specific cases while
saying that the same optimization won't matter in other cases.
People *have* done benchmarks for actual code and this is causing actual
slow-downs of around 20% in actual applications. That is the main reason
why I am trying to push this PEP (or PEP 575 which solves the same
problem in a different way).
On 2018-06-20 08:00, Stefan Behnel wrote:
> Just to add another bit of background on top of the current discussion,
> there is an idea around, especially in the scipy/big-data community, (and
> I'm not giving any guarantees here that it will lead to a PEP +
> implementation, as it depends on people's workload) to design a dedicated C
> level calling interface for Python. Think of it as similar to the buffer
> interface, but for calling arbitrary C functions by bypassing the Python
> call interface entirely. Objects that wrap some kind of C function (and
> there are tons of them in the CPython world) would gain C signature meta
> data, maybe even for overloaded signatures, and C code that wants to call
> them could validate that meta data and call them as native C calls.
I specifically designed PEP 580 to be extendable such that it would be
possible to add features later.
On 2018-06-18 15:09, Victor Stinner wrote:
> There are multiple issues with tp_fastcall:
Personally, I think that you are exaggerating these issues.
Below, I'm writing the word FASTCALL to refer to tp_fastcall in your
patch as well as my C call protocol in the PEP-in-progress.
> * ABI issue: it's possible to load a C extension using the old ABI,
> without tp_fastcall: it's not possible to write type->tp_fastcall on
> such type. This limitation causes different issues.
It's not hard to check for FASTCALL support and have a case distinction
between using tp_call and FASTCALL.
> * If tp_call is modified, tp_fastcall may be outdated.
I plan to support FASTCALL only for extension types. Those cannot be
changed from Python.
If it turns out that FASTCALL might give significant benefits also for
heap types, we can deal with those modifications: we already need to
deal with such modifications anyway for existing slots like __call__.
> * Many public functions of the C API still requires the tuple and dict
> to pass positional and keyword arguments, so a compatibility layer is
> required to types who only want to implement FASTCALL. Related issue:
> what is something calls tp_call with (args: tuple, kwargs: dict)?
> Crash or call a compatibility layer converting arguments to FASTCALL
> calling convention?
You make it sound as if such a "compatibility layer" is a big issue. You
just need one C API function to put in the tp_call slot which calls the
object instead using FASTCALL.
On 2018-06-18 03:34, INADA Naoki wrote:
> Victor had tried to add `tp_fastcall` slot, but he suspended his effort
> it's benefit is not enough for it's complexity.
I has a quick look at that patch and it's really orthogonal to what I'm
proposing. I'm proposing to use the slot *instead* of existing fastcall
optimizations. Victor's patch was about adding fastcall support to
classes that didn't support it before.
Yury Selivanov pushed his implementation of the PEP 567 -- Context
Variables at January 23, 2018. Yesterday, 4 months after the commit
and only 3 weeks before 3.7.0 final release, a crash has been found in
(it's now fixed, don't worry Ned!)
The bug is a "common" mistake in an object constructor implemented in
C: the object was tracked by the garbage collector before it was fully
initialized, and a GC collection caused a crash somewhere in "object
traversing". By "common", I mean that I saw this exact bug between 5
and 10 times over the last 5 years.
In the bpo issue, I asked why we only spotted the bug yesterday? It
seems like changing the threshold of the GC generation 0 from 700 to 5
triggers the bug immediately in test_context (tests of the PEP 567). I
wrote a proof-of-concept patch to change the threshold when using -X
Question: Do you think that bugs spotted by a GC collection are common
enough to change the GC thresholds in development mode (new -X dev
flag of Python 3.7)?
GC collections detect various kinds of bugs. Another "common" bug is
when an object remains somehow alive in the GC whereas its memory has
been freed: using PYTHONMALLOC=debug (debug feature already enabled by
-X dev), a GC collection will always crash in such case.
I'm not sure about the exact thresholds that would be used in
development mode. The general question is more if it would be useful.
Then the side question is if reducing the threshold would kill
performances or not.
About performances, -X dev allows to enable debug features which have
an "acceptable" cost in term of performance and memory, but enabled
features are chosen on a case by case basis. For example, I chose to
*not* enable tracemalloc using -X dev because the cost in term of CPU
*and* memory is too high (usually 2x slower and memory x2).