On Sat, 9 Jan 2016 at 07:04 Neil Girdhar <mistersheik@gmail.com> wrote:
On Sat, Jan 9, 2016 at 8:42 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,

2016-01-09 13:48 GMT+01:00 Neil Girdhar <mistersheik@gmail.com>:
> How is this not just a poorer version of PyPy's optimizations?

This a very good question :-) There are a lot of optimizers in the
wild, mostly JIT compilers. The problem is that most of them are
specific to numerical computations, and the remaining ones are generic
but not widely used. The most advanced and complete fast
implementation of Python is obviously PyPy. I didn't heard a lot of
deployements with PyPy. For example, PyPy is not used to install
OpenStack (a very large project which has a big number of
dependencies). I'm not even sure that PyPy is the favorite
implementation of Python used to run Django, to give another example
of popular Python application.

PyPy is just amazing in term of performances, but for an unknown
reason, it didn't replace CPython yet. PyPy has some drawbacks: it
only supports Python 2.7 and 3.2 (CPython is at the version 3.5), it
has bad performances on the C API and I heard that performances are
not as amazing as expected on some applications. PyPy has also a worse
startup time and use more memory. IMHO the major issue of Python is
the backward compatibility on the C API.

In short, almost all users are stuck at CPython and CPython implements
close to 0 optimization (come on, constant folding and dead code
elimintation is not what I would call an "optimization" ;-)).

My goal is to fill the hole between CPython (0 optimization) and PyPy
(the reference for best performances).

I wrote a whole website to explain the status of the Python optimizers
and why I want to write my own optimizer:
https://faster-cpython.readthedocs.org/index.html

I think this is admirable.  I also dream of faster Python.  However, we have a fundamental disagreement about how to get there.  You can spend your whole life adding one or two optimizations a year and Python may only end up twice as fast as it is now, which would still be dog slow. A meaningful speedup requires a JIT.  So, I question the value of this kind of change. 

Obviously a JIT can help, but even they can benefit from this. For instance, Pyjion could rely on this instead of creating our own guards for built-in and global namespaces if we wanted to inline calls to certain built-ins.
 


> If what you want is optimization, it would be much better to devote time to a solution
> that can potentially yield orders of magnitude worth of speedup like PyPy
> rather than increasing language complexity for a minor payoff.

I disagree that my proposed changes increase the "language
complexity". According to early benchmarks, my changes has a
negligible impact on performances. I don't see how adding a read-only
__version__ property to dict makes the Python *language* more complex?


It makes it more complex because you're adding a user-facing property.  Every little property adds up in the cognitive load of a language.  It also means that all of the other Python implementation need to follow suit even if their optimizations work differently.

What is the point of making __version__ an exposed property?  Why can't it be a hidden variable in CPython's underlying implementation of dict?  If some code needs to query __version__ to see if it's changed then CPython should be the one trying to discover this pattern and automatically generate the right code.  Ultimately, this is just a piece of a JIT, which is the way this is going to end up.

My whole design is based on the idea that my optimizer will be
optimal. You will be free to not use it ;-)

And sorry, I'm not interested to contribute to PyPy.

That's fine, but I think you are probably wasting your time then :)  The "hole between CPython and PyPy" disappears as soon as PyPy catches up to CPython 3.5 with numpy, and then all of this work goes with it.

That doesn't solve the C API compatibility problem, nor other issues some people have with PyPy deployments (e.g., inconsistent performance that can't necessarily be relied upon).