Raymond Hettinger used a regression that I introduced in the builtin sorted() function (in Python 3.6.0) to give me his feedback on my FASTCALL work, but also on Argument Clinic.
Since the reported issues is wider than just FASTCALL, including how I contribute to CPython, I decided to discuss the topic with a wider audience. I continue the discussion on python-committers to get the opinion of the other core developers.
Sorry for my very long answer! I tried to answer to each issues reported by Raymond.
Inaccurate summary: I'm a strong supporter of "it's better to ask forgiveness than permission", whereas Raymond considers that I introduced too many regressions with my workflow.
Raymond Hettinger added the comment:
A few random thoughts that may or may not be helpful:
We now have two seasoned developers and one new core developer that collectively are creating many non-trivial patches to core parts of Python at an unprecedented rate of change. The patches are coming in much faster than they can reasonably be reviewed and carefully considered, especially by devs such as myself who have very limited time available. IMO, taken as whole, these changes are destabilizing the language. Python is so successful and widely adopted that we can't afford a "shit happens" attitude. Perhaps that works in corners of the language, infrequently used modules, but it makes less sense when touching the critical paths that have had slow and careful evolution over 26 years.
Besides the volume of patches, one other reason that reviews are hard to come by is that they apply new APIs that I don't fully understand yet. There are perhaps two people on the planet who could currently give thoughtful, correct, and critical evaluation of all those patches. Everyone else is just watching them all fly by and hoping that something good is happening.
Since one or maybe even two years, I noticed that many of my issues were blocked by the lack of reviews. As you wrote, only few developer have the knowledge and background to be able to provide a good review (not only "tests pass, so LGTM") on my changes modifying the Python core.
I also wanted to discuss this topic, but I didn't really know what to propose. Let's take this opportunity to explain how I contribute to CPython, especially how I decide to wait for a review or not.
For each patch that I write, I estimate the risk of regression. You may know that any regression is something unexpected, so such estimation is tricky. Here is my heuristic:
(*) if the patch is trivial (short, non controversal), I push it immediatly.
(*) If I'm less confident, I open an issue and attach the patch. I wait at least one day before pushing.
It's strange, but the process of opening an issue and attaching the patch usually helps to review the code myself (find bugs, or more generally enhance the patch). Maybe because it forces me to review the change one more time?
If the change is not part of a larger patch serie, so doesn't block me to move further, I try to keep the issue open around one week.
The truth is that too few of my patches get a review :-/ Maybe I should wait longer, but then it becomes harder for me to handle many patches.
Maybe it's a tooling issues. Recently, I started to use local branches in a Git repository. It helps a lot of work on parallel on large changes. Before, I only worked in a single directory (default/, the default Mercurial branch) and applied/reverted patches everytime. It's painful, especially when I have to use multiple computers, download again publshed patches, etc. Maybe it will run smoother once CPython will move to Git and GitHub.
By the way, it's painful to squash a long patch serie into a giant patch, much harder to review, where changes don't make sense at all at the first look. Again, a better reviewing tool supporting patch series (GitHub) will help here too.
Not supporting patch series in our reviewing tool also explains why I prefer to push than having to wait for a review. Rebasing manually long patch series stored as giant .patch files is complicated.
(*) If the change changes an API or changes a core component, I wait for at least one review from a core reviewer. Sometimes, I even send an email to python-dev. Again, sometimes I don't get any feedback on the patch nor the email after two weeks :-/ At least, I tried :-) Usually, I get feedback in less than one week, or no feedback at all. I understand that nobody understands my change or nobody cares :-)
I totally understand that most core developers have a little amount of time available to contribute to Python. I'm trying to find a compromise between the risk of introducing regressions and being stuck in my work. This email might help me to adjust my workflow.
By the way, I'm trying to always run the full test suite (./python -m test -rW -j0) before pushing any change. If I suspect that I may have introduced reference leaks, I also run "./python -m test -R 3:3 ..." on the tests related to the modified code to check for memory/reference leaks.
- One other reason for the lack of review comments in the enthusiasm and fervor surrounding the patches. I feel like there is a cost of questioning whether the patches should be done or how they are done, like I am burning little karma every time. Sometimes it feels safest and most cordial to just say nothing and let you make hundreds of semi-reviewed changes to just about every critical part of the language.
"semi-reviewed". Let me be more accurate: yeah, I do push a lot of changes which were not reviewed by anyone (see above).
- Historically, if there was creator or maintainer of the code who was still active, that person would always be consulted and have a final say on whether a change should be applied. Now, we have code constantly being changed without consulting the original author (for example, the recent and catastrophic random initialization bug was due to application of a patch without consulting the author of _randommodule.c and the maintainer of random.py, or this change to sorted(), or the changes to decimal, etc).
What do you mean by "author"? As you wrote, Python is now 26 years old, so it had a very long history, and each file has a very long list of "authors". I guess that you mean more a "maintainer".
My problem is that I'm not aware of any explicit list of maintainers. I didn't know that you were the maintainer of the random module before you told me that at the Facebook sprint last september. I didn't expect that the random module had a maintainer, I thought that any core developer would be allowed to modify the code.
Moreover, since I open an issue for most of my changes, it gives an opportunity to maintainers to review changes. Maybe we need more components in the bug tracker to notify maintainers of pending changes?
You mentionned 3 different changes, let me reply.
(1) The random change: http://bugs.python.org/issue29085
I introduced a regression in random.Random.seed(): a typo in the C code has the consequence that the current time and process identifier is used, instead of os.urandom(16), to initialize the Mersenne Twister RNG.
IMHO the regression is not "catastrophic". Only few developers instanciate random.Random themself, random.Random must not be used for security, etc. I let others decide if this bug was catastrophic or not.
Since we are talking about the development process, let me see how the change was made.
Context: The PEP 524 has a long and painful history... Something like more than 500 messages were sent on the bug tracker and python-dev, and nobody was listening to each others, two security experts "rage-quitted" Python because of this mess... I decided to try to fix this issue in a constructive way, so I wrote a PEP. Nick wrote a different PEP, since it was clear that it was possible to handle security in two different incompatible ways. A mailing list was even created just to discuss this bug! A mailing list just for a bug gives an idea of the size of the mess :-)
Well, about the change itself, it was done in http://bugs.python.org/issue27776
The patch was available for review during 19 days (2016-08-18-2016-09-06) and was reviewed by Nick Coghlan. Since Nick wrote a similar PEP, I trusted him to be able to review my change. (Well, anyway I already trust all core developers, but I mean that I was trusting him even more than usual :-))
Since the change has a big impact on security, I had prefer to get a review of more developers, especially our security experts... but as I wrote, two security experts "rage- quitted". Again, this PEP has a long and sad story :-/
Note: you say that you are the maintainer of the random module, but I don't recall having see you in any recent discussions and issues related to os.urandom(), whereas a lot of enhancements and changes were done last 2 years. I made many changes to support new OS functions like getentropy() an getrandom().
Oooookay, let's see the second change, "this change to sorted()", http://bugs.python.org/issue29327
(2) I introduced a bug in sorted(), last August: https://hg.python.org/cpython/rev/15eab21bf934/
Calling sorted(iterable=) does crash. To be honest, I didn't imagine that anyone would pass the iterable by keyword, but Serhiy is very good to spot bugs in corner cases :-)
IMHO the regression is subtle.
When I optimized the code to use FASTCALL, I replaced PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1). I checked that all tests passed, so it looks ok to me.
I didn't imagine that anyone would call sorted(iterable=), so I didn't notice that PyTuple_GetSlice() can create an empty tuple.
The previous code was wrong since sorted() accepted iterable as a keyword, whereas sort.list() doesn't.
So well, I let you guess if a review would have spot this bug in the large change.
(3) Recently, I ran sed to replace code patterns to use faster ways to call functions: https://hg.python.org/cpython/rev/54a89144ee1d
"Replace PyObject_CallObject(callable, NULL) with _PyObject_CallNoArg(callable)"
I recalled that I modified the _decimal module and that Stefan Krah complained, because he wants to have the same code base on Python 3.5, 3.6 and 3.7. He also mentionned an external test suite which was broken by recent _decimal changes (not sure if my specific change was in cause or not), but I wasn't aware of it.
To be honest, I didn't even notice that I modified _decimal when I ran sed on all .c files. Since the change was straightforward and (IMHO) made the code more readable, I didn't even wait for a review if I recall correctly.
Stefan and me handled this issue privately (he reverted my change), I'm not sure that it's worth it to say more about this "issue" (or even "non-issue").
To be clear, I don't consider that my change introduced a regression.
In general, Guido has been opposed to sweeping changes across the code base for only tiny benefits. Of late, that rule seems to have been lost.
The benefits of FASTCALL mainly apply to fine grained functions which only do a little work and tend to be called frequently in loops. For functions such as sorted(), the calling overhead is dominated by the cost of actually doing the sort. For sorted(), FASTCALL is truly irrelevant and likely wasn't worth the complexity, or the actual bug, or any of the time we've now put in it. There was no actual problem being solved, just a desire to broadly apply new optimizations.
Ok, first, you qualify my FASTCALL changes as code churn. So let me show an example with sorted(): https://hg.python.org/cpython/rev/b34d2ef5c412
Can you elaborate how such change increases the complexity?
Second, "no actual problem being solved"
Since the goal of FASTCALL is to optimize Python, I guess that you consider that the speedup doesn't justify the change. I gave numbers in the issue #29327:
Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):
haypo@smithers$ ./python -m perf timeit 'seq=list(range(10))' 'sorted(seq)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 1.07 us +- 0.06 us -> [3.7] 958 ns +- 15 ns: 1.12x faster (-11%)
haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)
IMHO such speedup is significant even on a microbenchmark. Can you elaborate what are your criteria to decide if an optimization is worth it?
- Historically, we've relied on core developers showing restraint. Not every idea that pops into their head is immediately turned into a patch accompanied by pressure to apply it. Devs tended to restrict themselves to parts of the code they knew best through long and careful study rather sweeping through modules and altering other people's carefully crafted code.
Should I understand that I should restrict myself to some files? Or not touch some specific parts of Python, like... "your" code like random, itertools and collections modules?
I replied to the 3 issues you mentioned previously and explained how I contribute to Python.
FWIW, I applaud your efforts to reduce call overhead -- that has long been a sore spot for the language.
Guido has long opposed optimizations that increase risk of bugs, introduce complexity, or that affect long-term maintainability. In some places, it looks like FASTCALL is increasing the complexity (replacing something simple and well-understood with a wordier, more intricate API that I don't yet fully understand and will affect my ability to maintain the surrounding code).
I'm sorry, I didn't spent much time on explaing the FASTCALL design nor documenting my changes. It's partially deliberate to make everything related to FASTCALL private. Since it's a huge project modifying a lot of code, I wanted to wait until the APIs and the code stop moving too fast to take time to explain my work and document it.
If you have specific questions, please go ahead.
- FASTCALL replaces (args: tuple, kwargs: optional dict) with (args: C
array, nargs: int, kwnames: tuple of keyword keys). It's a new calling convention which allows to avoid a temporary tuple to pass positional arguments and avoids temporary dictionary to pass keyworkd arguments.
- To use FASTCALL, C functions should be converted to the new
METH_FASTCALL calling convention
- PyObject_Call() can replaced with _PyObject_FastCallKeywords() or
_PyObject_FastCallDict() (when we still get kwargs as a dict) in such conversion
- Many existing C functions were optimized internally to use FASCALL,
so even if you don't modify your code, you will benefit of it (speedup). Typical example: PyFunction_CallFunctionObjArgs().
The most massive change were purely internal and don't affect the most famous C APIs at all. In some cases, to fully benefit of FASTCALL, code should be modified. I'm trying to restrict such changes to Python internals, especially the most used functions.
I expected that the required changes were straightforward enough, it looks like I was wrong, but I don't recall anyone, before you recently, asking for an explanation.
- It was no long ago that you fought tooth-and-nail against a single line patch optimization I submitted. The code was clearly correct and had a simple disassembly to prove its benefit. Your opposition was based on "it increases the complexity of the code, introduces a maintenance cost, and increases the risk of bugs". In the end, your opposition killed the patch. But now, the AC and FASTCALL patches don't seem to mind any of these considerations.
It seems like we need more _explicit_ rules to decide if an optimization is worth it or not. For me, the de facto standard request for an optimization is to prove it with a benchmark. I requested a benchmark, but you refused to provide it.
So I ran my own benchmark and saw that your change made the modified code (PyList_Append()) 6% slower. I'm not sure that my bencmark was correct, but it was a first step to take a decision.
To come back to FASTCALL, your point is that it doesn't provide any speedup.
In most FASTCALL issues that I opened, I provide a script to reproduce my benchmark and the benchmark results. The speedup is usually betwen 10% and 20% faster.
Should I understand that 6% slower is ok, whereas 10-20% faster is not good? Can you please elaborate?
- AC is supposed to be a CPython-only concept. But along the way APIs are being changed without discussion. I don't mind that sorted() now exposes *iterable* as a keyword argument, but it was originally left out on purpose (Tim opined that code would look worse with iterable as a keyword argument). That decision was reversed unilaterally without consulting the author and without a test. Also as AC is being applied, the variable names are being changed. I never really liked the "mp" that used in dicts and prefer the use of "self" better, but it is a gratuitous change that unilaterally reverses the decisions of the authors and makes the code not match any of the surrounding code that uses the prior conventions.
Ah, at least I concur with you on one point :-) Changes to convert functions to AC must not change the API (type of arguments: positional only/keyword/..., default values, etc.) nor provide a worse docstring.
There is an active on-going work to enhance AC to fix issues that you reported, like the default value of positional-only parameters which should not be rendered in the function signature (I created the issue #29299 with a patch). Serhiy is also working on implementing the last major missing feature of AC: support *args and **kwargs parameters (issue #20291).
FYI I wasn't involved in AC changes, I only started to look at AC recently (1 or 2 months ago). Again, I agree that these changes should be carefully reviewed, which is an hard task since required changes are usually large and move a lot of code. We need more eyes to look at these changes!
For the specific case of sorted(), the name of first parameter is already documented in the docstring and documentation in Python 2.7: "iterable". So I guess that you mean that it is now possible to use it as a keyword argument. Well, see the issue #29327 for the long story. This issue is a regression, it was already fixed, and I didn't introduce the API change.
Oh by the way, when I read your comment, I understand that I'm responsible of all regressions. It's true that I introduced regressions, that's where I said "shit happens" (or more politically correct: "it's better to ask forgiveness than permission" ;-)). Since I'm one of the most active contributor in CPython, I'm not surprised of being the one who introduce many (most?) regressions :-) I'm trying to review my changes multiple times, test corner cases, etc. But I'm not perfect.
Sadly, to show its full power, FASTCALL requires changes at many levels of the code. It requires to change at lot of code, but I understood that core developers approved the whole project. Maybe I was wrong? At least, I asked for permissions multiple changes, especially at the start.
- FWIW, the claim that the help is much better is specious. AFAICT, there has never been the slightest problem with "sorted(iterable, key=None, reverse=False) --> new sorted list" which has been clear since the day it was released. It is some of the new strings the are causing problems with users (my students frequently are tripped-up by the / notation for example; no one seems to be able to intuit what it means without it being explained first).
Good news, it seems like you have a good experience in API design, documentation, etc. Join the "Argument Clinic" project to help us to enhance docstrings, function signatures and documentation ;-)
See the good part of the AC on-going work: it's a nice opportunity to also enhance documentation, not only provide a signature.
By the way, to be honest, the main advantage of converting functions to AC is to get a signature. The signature is visible in docstrings which is nice, but it is also very useful to a wide range of tools like (IDE, static checks, etc.).
Conversion to FASTCALL is more a nice effect. At least, it is a good motivation for me to convert mor and more code to AC :-)
AC moves docstring closer to the list of parameters. IHMO it makes the C code simpler to read and understand. It also removes the boring code responsible to "parse" arguments, so it makes the code shorter. But well, this is just my opinion.
- FWIW, I'm trying to be constructive and contribute where I can, but frankly I can't keep up with the volume of churn. Having seen bugs being introduced, it is not inappropriate to ask another dev to please be careful, especially when that dev has been prolific to an unprecedented degree and altering core parts of the language for function calls, to new opcodes, the memory allocators, etc. Very few people on the planet are competent to review these changes, make reasonable assessments about whether the complexity and churn are worth it. An fewer still have the time to keep up with the volume of changes.
Hum, I wasn't involved in bytecode changes.
Well, I reviewed the very good work of Demur Rumed. I recall that you worked on a similar area, trying to fetch bytecode by 16-bit instead of 8-bit. Demur proposed a good design and I recall that the design was approved.
I helped a little bit on the implementation and I pushed the final change, but all credits go to Demur and Serhiy Storshaka! By the way, Serhiy made further efficient enhancements in the bytecode of CALL_FUNCTION instructions.
About memory allocations, I guess that you are referring to my change on PyMem_Malloc() allocator. I discussed the issue on python-dev and waited for approval of my peers before pushing anything, since I know well that it's a critical part of Python: https://mail.python.org/pipermail/python-dev/2016-March/143467.html
I provide all data requested by Marc Andre Lemburg (test the change with common projects, Django, Pillow, numpy) and made further changes (PYTHONMALLOC=debug tool) to help to handle this backward incompatible change (GIL is now required to call PyMem_Malloc).
Hopefully, it seems like nobody noticed this subtle change (GIL now requied): I didn't see any bug report. By the way, I fixed a misused PyMem_Mem() in numpy.
- Please do continue your efforts to improve the language, but also please moderate the rate of change, mitigate the addition complexity, value stability over micro-optimizations, consult the authors and maintainers of code, take special care without code that hasn't been reviewed because that lacks a safety net, and remember that newer devs may be taking cues from you (do you want them making extensive changes to long existing stable code without consulting the authors and with weak LGTM reviews?)
Ok, I will do it.
Thank you for you feedback Raymond. I hope that my email helps you to understand how I work and how I take my decisions.