New subject: Pace of change for Python 3.x [was: My cavalier and aggressive manner, API] change and bugs introduced for basically zero benefit

20 Jan 2017 · _explicit_

      Hi,
Raymond Hettinger used a regression that I introduced in the builtin
sorted() function (in Python 3.6.0) to give me his feedback on my
FASTCALL work, but also on Argument Clinic.
Context: http://bugs.python.org/issue29327#msg285848
Since the reported issues is wider than just FASTCALL, including how I
contribute to CPython, I decided to discuss the topic with a wider
audience. I continue the discussion on python-committers to get the
opinion of the other core developers.
Sorry for my very long answer! I tried to answer to each issues
reported by Raymond.
Inaccurate summary: I'm a strong supporter of "it's better to ask
forgiveness than permission", whereas Raymond considers that I
introduced too many regressions with my workflow.
Raymond Hettinger added the comment:
...
A few random thoughts that may or may not be helpful:

We now have two seasoned developers and one new core developer that collectively are creating many non-trivial patches to core parts of Python at an unprecedented rate of change.  The patches are coming in much faster than they can reasonably be reviewed and carefully considered, especially by devs such as myself who have very limited time available.  IMO, taken as whole, these changes are destabilizing the language.  Python is so successful and widely adopted that we can't afford a "shit happens" attitude.  Perhaps that works in corners of the language, infrequently used modules, but it makes less sense when touching the critical paths that have had slow and careful evolution over 26 years.

Besides the volume of patches, one other reason that reviews are hard to come by is that they apply new APIs that I don't fully understand yet.  There are perhaps two people on the planet who could currently give thoughtful, correct, and critical evaluation of all those patches.  Everyone else is just watching them all fly by and hoping that something good is happening.

Since one or maybe even two years, I noticed that many of my issues
were blocked by the lack of reviews. As you wrote, only few developer
have the knowledge and background to be able to provide a good review
(not only "tests pass, so LGTM") on my changes modifying the Python
core.
I also wanted to discuss this topic, but I didn't really know what to
propose. Let's take this opportunity to explain how I contribute to
CPython, especially how I decide to wait for a review or not.
For each patch that I write, I estimate the risk of regression. You
may know that any regression is something unexpected, so such
estimation is tricky. Here is my heuristic:
(*) if the patch is trivial (short, non controversal), I push it immediatly.
(*) If I'm less confident, I open an issue and attach the patch. I
wait at least one day before pushing.
It's strange, but the process of opening an issue and attaching the
patch usually helps to review the code myself (find bugs, or more
generally enhance the patch). Maybe because it forces me to review the
change one more time?
If the change is not part of a larger patch serie, so doesn't block me
to move further, I try to keep the issue open around one week.
The truth is that too few of my patches get a review :-/ Maybe I
should wait longer, but then it becomes harder for me to handle many
patches.
Maybe it's a tooling issues. Recently, I started to use local branches
in a Git repository. It helps a lot of work on parallel on large
changes. Before, I only worked in a single directory (default/, the
default Mercurial branch) and applied/reverted patches everytime. It's
painful, especially when I have to use multiple computers, download
again publshed patches, etc. Maybe it will run smoother once CPython
will move to Git and GitHub.
By the way, it's painful to squash a long patch serie into a giant
patch, much harder to review, where changes don't make sense at all at
the first look. Again, a better reviewing tool supporting patch series
(GitHub) will help here too.
Not supporting patch series in our reviewing tool also explains why I
prefer to push than having to wait for a review. Rebasing manually
long patch series stored as giant .patch files is complicated.
(*) If the change changes an API or changes a core component, I wait
for at least one review from a core reviewer. Sometimes, I even send
an email to python-dev. Again, sometimes I don't get any feedback on
the patch nor the email after two weeks :-/ At least, I tried :-)
Usually, I get feedback in less than one week, or no feedback at all.
I understand that nobody understands my change or nobody cares :-)
I totally understand that most core developers have a little amount of
time available to contribute to Python. I'm trying to find a
compromise between the risk of introducing regressions and being stuck
in my work. This email might help me to adjust my workflow.
By the way, I'm trying to always run the full test suite (./python -m
test -rW -j0) before pushing any change. If I suspect that I may have
introduced reference leaks, I also run "./python -m test -R 3:3 ..."
on the tests related to the modified code to check for
memory/reference leaks.
...

One other reason for the lack of review comments in the enthusiasm and fervor surrounding the patches.  I feel like there is a cost of questioning whether the patches should be done or how they are done, like I am burning little karma every time.  Sometimes it feels safest and most cordial to just say nothing and let you make hundreds of semi-reviewed changes to just about every critical part of the language.

"semi-reviewed". Let me be more accurate: yeah, I do push a lot of
changes which were not reviewed by anyone (see above).
...

Historically, if there was creator or maintainer of the code who was still active, that person would always be consulted and have a final say on whether a change should be applied.  Now, we have code constantly being changed without consulting the original author (for example, the recent and catastrophic random initialization bug was due to application of a patch without consulting the author of _randommodule.c and the maintainer of random.py, or this change to sorted(), or the changes to decimal, etc).

What do you mean by "author"? As you wrote, Python is now 26 years
old, so it had a very long history, and each file has a very long list
of "authors". I guess that you mean more a "maintainer".
My problem is that I'm not aware of any explicit list of maintainers.
I didn't know that you were the maintainer of the random module before
you told me that at the Facebook sprint last september. I didn't
expect that the random module had a maintainer, I thought that any
core developer would be allowed to modify the code.
Moreover, since I open an issue for most of my changes, it gives an
opportunity to maintainers to review changes. Maybe we need more
components in the bug tracker to notify maintainers of pending
changes?
You mentionned 3 different changes, let me reply.
(1) The random change: http://bugs.python.org/issue29085
I introduced a regression in random.Random.seed(): a typo in the C
code has the consequence that the current time and process identifier
is used, instead of os.urandom(16), to initialize the Mersenne Twister
RNG.
IMHO the regression is not "catastrophic". Only few developers
instanciate random.Random themself, random.Random must not be used for
security, etc. I let others decide if this bug was catastrophic or
not.
Since we are talking about the development process, let me see how the
change was made.
Context: The PEP 524 has a long and painful history... Something like
more than 500 messages were sent on the bug tracker and python-dev,
and nobody was listening to each others, two security experts
"rage-quitted" Python because of this mess... I decided to try to fix
this issue in a constructive way, so I wrote a PEP. Nick wrote a
different PEP, since it was clear that it was possible to handle
security in two different incompatible ways. A mailing list was even
created just to discuss this bug! A mailing list just for a bug gives
an idea of the size of the mess :-)
Well, about the change itself, it was done in http://bugs.python.org/issue27776
The patch was available for review during 19 days
(2016-08-18-2016-09-06) and was reviewed by Nick Coghlan. Since Nick
wrote a similar PEP, I trusted him to be able to review my change.
(Well, anyway I already trust all core developers, but I mean that I
was trusting him even more than usual :-))
Since the change has a big impact on security, I had prefer to get a
review of more developers, especially our security experts... but as I
wrote, two security experts "rage- quitted". Again, this PEP has a
long and sad story :-/
Note: you say that you are the maintainer of the random module, but I
don't recall having see you in any recent discussions and issues
related to os.urandom(), whereas a lot of enhancements and changes
were done last 2 years. I made many changes to support new OS
functions like getentropy() an getrandom().
Oooookay, let's see the second change, "this change to sorted()",
http://bugs.python.org/issue29327
(2) I introduced a bug in sorted(), last August:
https://hg.python.org/cpython/rev/15eab21bf934/
Calling sorted(iterable=[]) does crash. To be honest, I didn't imagine
that anyone would pass the iterable by keyword, but Serhiy is very
good to spot bugs in corner cases :-)
IMHO the regression is subtle.
When I optimized the code to use FASTCALL, I replaced
PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1). I
checked that all tests passed, so it looks ok to me.
I didn't imagine that anyone would call sorted(iterable=[]), so I
didn't notice that PyTuple_GetSlice() can create an empty tuple.
The previous code was wrong since sorted() accepted iterable as a
keyword, whereas sort.list() doesn't.
So well, I let you guess if a review would have spot this bug in the
large change.
(3) Recently, I ran sed to replace code patterns to use faster ways to
call functions:
https://hg.python.org/cpython/rev/54a89144ee1d
"Replace PyObject_CallObject(callable, NULL) with _PyObject_CallNoArg(callable)"
I recalled that I modified the _decimal module and that Stefan Krah
complained, because he wants to have the same code base on Python 3.5,
3.6 and 3.7. He also mentionned an external test suite which was
broken by recent _decimal changes (not sure if my specific change was
in cause or not), but I wasn't aware of it.
To be honest, I didn't even notice that I modified _decimal when I ran
sed on all .c files. Since the change was straightforward and (IMHO)
made the code more readable, I didn't even wait for a review if I
recall correctly.
Stefan and me handled this issue privately (he reverted my change),
I'm not sure that it's worth it to say more about this "issue" (or
even "non-issue").
To be clear, I don't consider that my change introduced a regression.
...

In general, Guido has been opposed to sweeping changes across the code base for only tiny benefits.  Of late, that rule seems to have been lost.

The benefits of FASTCALL mainly apply to fine grained functions which only do a little work and tend to be called frequently in loops.  For functions such as sorted(), the calling overhead is dominated by the cost of actually doing the sort.  For sorted(), FASTCALL is truly irrelevant and likely wasn't worth the complexity, or the actual bug, or any of the time we've now put in it.  There was no actual problem being solved, just a desire to broadly apply new optimizations.

Ok, first, you qualify my FASTCALL changes as code churn. So let me
show an example with sorted():
https://hg.python.org/cpython/rev/b34d2ef5c412
Can you elaborate how such change increases the complexity?
Second, "no actual problem being solved"
Since the goal of FASTCALL is to optimize Python, I guess that you
consider that the speedup doesn't justify the change. I gave numbers
in the issue #29327:
Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):
haypo@smithers$ ./python -m perf timeit 'seq=list(range(10))'
'sorted(seq)' --compare-to=../3.5/python -v
Median +- std dev: [3.5] 1.07 us +- 0.06 us -> [3.7] 958 ns +- 15 ns:
1.12x faster (-11%)
haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda
x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v
Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05
us: 1.26x faster (-21%)
IMHO such speedup is significant even on a microbenchmark. Can you
elaborate what are your criteria to decide if an optimization is worth
it?
...

Historically, we've relied on core developers showing restraint.  Not every idea that pops into their head is immediately turned into a patch accompanied by pressure to apply it.  Devs tended to restrict themselves to parts of the code they knew best through long and careful study rather sweeping through modules and altering other people's carefully crafted code.

Should I understand that I should restrict myself to some files? Or
not touch some specific parts of Python, like... "your" code like
random, itertools and collections modules?
I replied to the 3 issues you mentioned previously and explained how I
contribute to Python.
...

FWIW, I applaud your efforts to reduce call overhead -- that has long been a sore spot for the language.

Guido has long opposed optimizations that increase risk of bugs, introduce complexity, or that affect long-term maintainability.   In some places, it looks like FASTCALL is increasing the complexity (replacing something simple and well-understood with a wordier, more intricate API that I don't yet fully understand and will affect my ability to maintain the surrounding code).

I'm sorry, I didn't spent much time on explaing the FASTCALL design
nor documenting my changes. It's partially deliberate to make
everything related to FASTCALL private. Since it's a huge project
modifying a lot of code, I wanted to wait until the APIs and the code
stop moving too fast to take time to explain my work and document it.
If you have specific questions, please go ahead.
Shortest summary:

FASTCALL replaces (args: tuple, kwargs: optional dict) with (args: C
array, nargs: int, kwnames: tuple of keyword keys). It's a new calling
convention which allows to avoid a temporary tuple to pass positional
arguments and avoids temporary dictionary to pass keyworkd arguments.

To use FASTCALL, C functions should be converted to the new
METH_FASTCALL calling convention

PyObject_Call() can replaced with _PyObject_FastCallKeywords() or
_PyObject_FastCallDict() (when we still get kwargs as a dict) in such
conversion

Many existing C functions were optimized internally to use FASCALL,
so even if you don't modify your code, you will benefit of it
(speedup). Typical example: PyFunction_CallFunctionObjArgs().

The most massive change were purely internal and don't affect the most
famous C APIs at all. In some cases, to fully benefit of FASTCALL,
code should be modified. I'm trying to restrict such changes to Python
internals, especially the most used functions.
I expected that the required changes were straightforward enough, it
looks like I was wrong, but I don't recall anyone, before you
recently, asking for an explanation.
...

It was no long ago that you fought tooth-and-nail against a single line patch optimization I submitted.  The code was clearly correct and had a simple disassembly to prove its benefit.  Your opposition was based on "it increases the complexity of the code, introduces a maintenance cost, and increases the risk of bugs".  In the end, your opposition killed the patch.  But now, the AC and FASTCALL patches don't seem to mind any of these considerations.

Context: http://bugs.python.org/issue26201
It seems like we need more _explicit_ rules to decide if an
optimization is worth it or not. For me, the de facto standard request
for an optimization is to prove it with a benchmark. I requested a
benchmark, but you refused to provide it.
So I ran my own benchmark and saw that your change made the modified
code (PyList_Append()) 6% slower. I'm not sure that my bencmark was
correct, but it was a first step to take a decision.
To come back to FASTCALL, your point is that it doesn't provide any speedup.
In most FASTCALL issues that I opened, I provide a script to reproduce
my benchmark and the benchmark results. The speedup is usually betwen
10% and 20% faster.
Should I understand that 6% slower is ok, whereas 10-20% faster is not
good? Can you please elaborate?
...

AC is supposed to be a CPython-only concept.  But along the way APIs are being changed without discussion.  I don't mind that sorted() now exposes *iterable* as a keyword argument, but it was originally left out on purpose (Tim opined that code would look worse with iterable as a keyword argument).  That decision was reversed unilaterally without consulting the author and without a test.  Also as AC is being applied, the variable names are being changed.  I never really liked the "mp" that used in dicts and prefer the use of "self" better, but it is a gratuitous change that unilaterally reverses the decisions of the authors and makes the code not match any of the surrounding code that uses the prior conventions.

Ah, at least I concur with you on one point :-) Changes to convert
functions to AC must not change the API (type of arguments: positional
only/keyword/..., default values, etc.) nor provide a worse docstring.
There is an active on-going work to enhance AC to fix issues that you
reported, like the default value of positional-only parameters which
should not be rendered in the function signature (I created the issue
#29299 with a patch). Serhiy is also working on implementing the last
major missing feature of AC: support *args and **kwargs parameters
(issue #20291).
FYI I wasn't involved in AC changes, I only started to look at AC
recently (1 or 2 months ago). Again, I agree that these changes should
be carefully reviewed, which is an hard task since required changes
are usually large and move a lot of code. We need more eyes to look at
these changes!
For the specific case of sorted(), the name of first parameter is
already documented in the docstring and documentation in Python 2.7:
"iterable". So I guess that you mean that it is now possible to use it
as a keyword argument. Well, see the issue #29327 for the long story.
This issue is a regression, it was already fixed, and I didn't
introduce the API change.
Oh by the way, when I read your comment, I understand that I'm
responsible of all regressions. It's true that I introduced
regressions, that's where I said "shit happens" (or more politically
correct: "it's better to ask forgiveness than permission" ;-)). Since
I'm one of the most active contributor in CPython, I'm not surprised
of being the one who introduce many (most?) regressions :-) I'm trying
to review my changes multiple times, test corner cases, etc. But I'm
not perfect.
Sadly, to show its full power, FASTCALL requires changes at many
levels of the code. It requires to change at lot of code, but I
understood that core developers approved the whole project. Maybe I
was wrong? At least, I asked for permissions multiple changes,
especially at the start.
...

FWIW, the claim that the help is much better is specious.  AFAICT, there has never been the slightest problem with "sorted(iterable, key=None, reverse=False) --> new sorted list" which has been clear since the day it was released.   It is some of the new strings the are causing problems with users (my students frequently are tripped-up by the / notation for example; no one seems to be able to intuit what it means without it being explained first).

Good news, it seems like you have a good experience in API design,
documentation, etc. Join the "Argument Clinic" project to help us to
enhance docstrings, function signatures and documentation ;-)
See the good part of the AC on-going work: it's a nice opportunity to
also enhance documentation, not only provide a signature.
By the way, to be honest, the main advantage of converting functions
to AC is to get a signature. The signature is visible in docstrings
which is nice, but it is also very useful to a wide range of tools
like (IDE, static checks, etc.).
Conversion to FASTCALL is more a nice effect. At least, it is a good
motivation for me to convert mor and more code to AC :-)
AC moves docstring closer to the list of parameters. IHMO it makes the
C code simpler to read and understand. It also removes the boring code
responsible to "parse" arguments, so it makes the code shorter. But
well, this is just my opinion.
...

FWIW, I'm trying to be constructive and contribute where I can, but frankly I can't keep up with the volume of churn.   Having seen bugs being introduced, it is not inappropriate to ask another dev to please be careful, especially when that dev has been prolific to an unprecedented degree and altering core parts of the language for function calls, to new opcodes, the memory allocators, etc.  Very few people on the planet are competent to review these changes, make reasonable assessments about whether the complexity and churn are worth it.  An fewer still have the time to keep up with the volume of changes.

Hum, I wasn't involved in bytecode changes.
Well, I reviewed the very good work of Demur Rumed. I recall that you
worked on a similar area, trying to fetch bytecode by 16-bit instead
of 8-bit. Demur proposed a good design and I recall that the design
was approved.
I helped a little bit on the implementation and I pushed the final
change, but all credits go to Demur and Serhiy Storshaka! By the way,
Serhiy made further efficient enhancements in the bytecode of
CALL_FUNCTION instructions.
About memory allocations, I guess that you are referring to my change
on PyMem_Malloc() allocator. I discussed the issue on python-dev and
waited for approval of my peers before pushing anything, since I know
well that it's a critical part of Python:
https://mail.python.org/pipermail/python-dev/2016-March/143467.html
I provide all data requested by Marc Andre Lemburg (test the change
with common projects, Django, Pillow, numpy) and made further changes
(PYTHONMALLOC=debug tool) to help to handle this backward incompatible
change (GIL is now required to call PyMem_Malloc).
Hopefully, it seems like nobody noticed this subtle change (GIL now
requied): I didn't see any bug report. By the way, I fixed a misused
PyMem_Mem() in numpy.
...

Please do continue your efforts to improve the language, but also please moderate the rate of change, mitigate the addition complexity, value stability over micro-optimizations, consult the authors and maintainers of code, take special care without code that hasn't been reviewed because that lacks a safety net, and remember that newer devs may be taking cues from you (do you want them making extensive changes to long existing stable code without consulting the authors and with weak LGTM reviews?)

Ok, I will do it.
Thank you for you feedback Raymond. I hope that my email helps you to
understand how I work and how I take my decisions.
Victor

My cavalier and aggressive manner, API change and bugs introduced for basically zero benefit

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL):

haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 us: 1.26x faster (-21%)

tags

participants (21)