Python and the need for speed
Steven D'Aprano
steve at pearwood.info
Tue Apr 11 04:19:31 EDT 2017
On Sun, 09 Apr 2017 19:05:35 +1000, Chris Angelico wrote:
> When people talk about making a restricted optimizable subset of Python,
> the implication (if not the explicit declaration) is that it's done
> strictly by removing, not by rewriting.
Maybe *some* people have a naive, if not foolish, idea that all you need
to do to speed up CPython is to go through the C code and hit delete over
any functions you don't want. But that's hardly a technique that is going
to work: not only will you break features, but merely removing code
doesn't magically make the rest of the code base faster.
At the very least, you have to re-write the code that depends on the
features you have just removed.
If we're going to talk about speeding up Python, we ought to talk about
*serious* approaches to the problem, not the musing of random, ignorant
bloggers and other naive commentators. So let's look at what the experts
have done:
PyPy: a complete re-write, with a completely different approach (JIT
compilation).
WPython (alas, now more or less abandoned): a complete re-write of the
virtual machine, using 16-bit "wordcodes" instead of 8 bit "bytecodes",
nevertheless this enabled a lot of good optimizations:
http://repository.root-me.org/Programmation/Python/EN%20-%20Beyond%
20python%20bytecode.pdf
https://storage.googleapis.com/google-code-archive-downloads/v2/
code.google.com/wpython2/Cleanup%20and%20new%20optimizations%20in%
20WPython%201.1.pdf
Unladen Swallow: among other changes, added new, specialist byte-codes to
speed up common operations.
http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html
Nuitka: a static optimizing compiler, and a complete re-write. Nuitka
implements a number of compile-time optimizations while still keeping
(claimed) 100% compatibility with CPython.
http://nuitka.net/doc/user-manual.html
FATPython adds compile-time name lookups, specialised functions, dead-
code elimination, constant folding and propagation to a CPython base.
https://haypo.github.io/fat-python-status-janv12-2016.html
Cython: a *superset*, not subset, of Python.
I could continue, but you get the point. Some of these are experimental,
like FATPython; some were production ready but abandoned because the
community had little interest (wpython); some are actively maintained and
used in production by many people (cython, PyPy).
The Python ecosystem is actually quite healthy, if you need to speed up
code there are lots of solutions, and some of them are even good
solutions. Nevertheless, it is interesting to discuss whether or not any
of these features will go mainstream or make it into CPython.
But anyway... it's not sufficient to just delete features to speed up a
language. You have to change implementation, and very likely add entirely
new internal APIs for the compiler.
Creating a restricted subset of Python, and giving up compatibility with
ordinary Python, is certainly an opinion. That's what Facebook did with
their version of PHP. But its not the only approach.
> A couple more quotes from the article:
>
>> It should be possible to define a subset of the Python language,
>> uninspiredly dubbed “TurboPython”, that excludes those features that
>> stand in the way of high-performance JIT execution (or compilation).
>> Not using some of these features coincides with good design practices,
>> so it doesn’t necessarily have to be all bad.
> ...
>> Since TurboPython is a subset of Python, it will also run on Python
>> interpreters, albeit slower.
>
> Both of these imply that the standard library of TurboPython is
> *exactly* the standard library of CPython,
No. It doesn't say anything about the standard library. Maybe TurboPython
has no standard library at all. The library is independent of the
interpreter, and until somebody actually puts the effort in of writing it
(or copying it from one repo to another), we are equally justified in
saying any of:
- TurboPython's std lib will have 100% functional compatibility
with Python, only the implementation may change;
- TurboPython has no std lib, it only has the built-in types and modules,
if you want more, you can port it yourself;
- TurboPython's std lib is a subset of the Python std lib, just as
TurboPython itself is a subset of Python;
- TurboPython's std lib is a MILLION TIMES BETTER than Python's.
Since TurboPython doesn't exist, any of those statements are equally true.
But in a *practical* sense, for our mythical/hypothetical TurboPython to
have a chance at going mainstream, it needs a record/struct type, which
in Python is namedtuple. That does not necessarily mean it needs to
exec() arbitrary code. It just needs a way to create classes dynamically.
Even Java can do that.
http://stackoverflow.com/questions/2320404/
> minus the bits that aren't
> supported. We just cut a few of those nasty dynamic features out of the
> language, and voila! it becomes faster.
That's exactly what I'm talking about. Anyone who says "voila! it becomes
faster" and means it is too foolish to live :-)
Cutting out the dynamic features is just the first step in allowing you
to re-write the interpreter with more optimizations. A non-optimizing
compiler doesn't magically become optimizing just because you've deleted
some code from the interpreter.
> (Or in this case,
> JITable.) There's nothing suggested here about reimplementing existing
> features in a different way,
Except for common bloody sense :-P
[...]
> But again, you certainly COULD reimplement literal_eval, but then you
> have to keep your implementation accurate and up-to-date, else you risk
> bugs creeping in. It's a non-trivial task to rewrite these kinds of
> things and maintain parallel versions;
Sure, it's not thirty seconds work, but it's not *hard* to write a parser
that accepts a small subset of Python expressions (those involving only
literals and a handful of built-in displays).
Are you aware that literal_eval is *not* the same code that the Python
interpreter uses for evaluating literals? They are kept in sync manually,
or not kept in sync as the case may be.
Python 2.7 didn't support the set display syntax even though Python did:
>>> from ast import literal_eval
>>> {1, 2}
set([1, 2])
>>> literal_eval("{1, 2}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python2.7/ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
literal_eval also had a bug that allowed it to evaluate the + and -
operator, needed to support complex numbers:
http://stackoverflow.com/questions/40584417/why-does-ast-literal-eval5-7
The point being, while it's not *ideal* that support for evaluating
Python literals needs to be written twice (once in the real interpreter,
once in literal_eval) its not unusual or uncommon or a particularly
onerous requirement.
> by calling on compile(), the
> current implementation *for free* is kept up-to-date with changes in
> CPython's grammar.
That's incorrect. literal_eval() needs a whitelist of what's allowed, and
a blacklist of what is allowed by the whitelist but shouldn't be. There
may be other work needed as well. See the source code :-)
[...]
> Yes. As long as it can know when the more dynamic features are being
> used - and that's the hard part.
Hard, but not impossible, and not that hard. Look at Nuitka and FATPython.
The thing that people forget is that not all errors are equal. There are
two ways that the optimizing compiler can do the wrong thing:
- fail to apply an optimization when it is safe to do so;
- apply an optimization when it is unsafe to do so.
Those errors are not equivalent. In the first case, all that happens is
that you lose an opportunity to be a little bit faster, but the executed
code is still correct. (Or at least no less correct than what you wrote.)
If you even notice, which you might not, it might be tricky to work out
why the compiler isn't applying the optimization, but in principle it can
be done and then worked around, or the compiler made a bit smarter.
For example, it might be that the compiler applies an absolutely dead-
simple rule: *any* use of eval or exec disables *all* optimizations.
Don't like it? Then don't use eval or exec. If that's not an option for
you, patches are welcome.
The second case is much worse. If you're lucky, the "optimized" code will
merely crash, and you'll know it is broken. But worse, it may be subtly
incorrect, and do the wrong thing. Calculate a *very slightly off*
result. Or transfer funds to the wrong account, delete the wrong files,
turn off the wrong life-support machine, that sort of thing.
The second case must be avoided at all costs. (Or at least, unless the
programmer consents to unsafe optimizations that may change behaviour,
such as many unsafe floating point optermizations.)
The first case merely indicates room for improvement. Patches are welcome.
> Remember, as soon as you have a single
> class implemented in Python, it could have a method injected into it
> without your knowledge. Can you detect that statically, or can you use
> the versioning of __dict__ to notice that something's been broken? What
> makes you fall back to the default implementation?
Ask Victor Stinner.
You ask these rhetorical questions as if they're major impediments, but
they actually are more-or-less solved problems. The tricky part is just
applying the solutions to *Python* itself. It's one thing to have a
general strategy for optimizations, its another to apply it to a real
language, and especially a real implementation.
But effectively, yes, I believe that dict versioning can be used to do
this. Suppose you have code that says:
result = myinstance.method(99)
So you compile byte-code something like this pseudo-code:
lookup myinstance
if type(myinstance) is the expected type and neither the type
nor instance __dict__ have changed:
push 99 on the stack
JUMP TO CODE AT 123456 # original location of method's code
return
else:
get myinstance
lookup method in myinstance.__dict__
if not found:
lookup method in each of type(myisinstance).__mro__ or fail
wrap the found object in MethodType
build a set of positional and keyword arguments
pass those positional and keyword arguments to method object
which unpacks them and assigns to parameters
location = look up the method's code object
set up the book-keeping needed
JUMP TO CODE AT location # probably 123456 as above
tidy up book-keeping
return
The first case avoids a HUGE amount of work, at the cost of checking a
couple of guards ahead of time. So the fast path becomes (say) 20%
faster, and the slow path becomes 2% slower. Even if you take the slow
path nine times as often as the fast path, it's still an overall win.
But you don't get that just be selecting the C code for eval() and exec()
in your editor and pressing Delete.
--
Steve
More information about the Python-list
mailing list