[Python-Dev] Opcode cache in ceval loop

Mon Feb 1 15:08:16 EST 2016

On Mon, 1 Feb 2016 at 11:51 Yury Selivanov <yselivanov.ml at gmail.com> wrote:

> Hi Brett,
>
> On 2016-02-01 2:30 PM, Brett Cannon wrote:
> >
> >
> > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov <yselivanov.ml at gmail.com
> > <mailto:yselivanov.ml at gmail.com>> wrote:
> >
> >     Hi,
> >
> [..]
> >
> >     What's next?
> >     ------------
> >
> >     First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
> >     [1].  It's a very straightforward optimization, the patch is small
> and
> >     easy to review.
> >
> >
> > +1 from me.
> >
> >
> >     Second, I'd like to merge the new opcode cache, see issue 26219 [5].
> >     All unittests pass.  Memory usage increase is very moderate (<1mb for
> >     the entire test suite), and the performance increase is significant.
> >     The only potential blocker for this is PEP 509 approval (which I'd be
> >     happy to assist with).
> >
> >
> > I think the fact that it improves performance across the board as well
> > as eliminates the various tricks people use to cache global and
> > built-ins, a big +1 from me. I guess that means Victor needs to ask
> > for pronouncement on PEP 509.
>
> Great!  AFAIK Victor still needs to update the PEP with some changes
> (globally unique ma_version).  My patch includes the latest
> implementation of PEP 509, and it works fine (no regressions, no broken
> unittests).  I can also assist with reviewing Victor's implementation if
> the PEP is accepted.
>
> >
> > BTW, where does LOAD_ATTR fit into all of this?
>
> LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I
> understand you question correctly).  It's based on the following
> assumptions (that really make JITs work so well):
>
> 1. Most classes don't implement __getattribute__.
>
> 2. A lot of attributes are stored in objects' __dict__s.
>
> 3. Most attributes aren't shaded by descriptors/getters-setters; most
> code just uses "self.attr".
>
> 4. An average method/function works on objects of the same type. Which
> means that those objects were constructed in a very similar (if not
> exact) fashion.
>
> For instance:
>
> class F:
>     def __init__(self, name):
>         self.name = name
>     def say(self):
>         print(self.name)   # <- For all F instances,
>                            # offset of 'name' in `F().__dict__`s
>                            # will be the same
>
> If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets
> deoptimized, and the default implementation is used.  So if the code is
> very dynamic - there's no improvement, but no performance penalty either.
>
> In my patch, I use the cache to store (for LOAD_ATTR specifically):
>
> - pointer to object's type
> - type->tp_version_tag
> - the last successful __dict__ offset
>
> The first two fields are used to make sure that we have objects of the
> same type.  If it changes, we deoptimize the opcode immediately.  Then
> we try the offset.  If it's successful - we have a cache hit.  If not,
> that's fine, we'll try another few times before deoptimizing the opcode.
>

So this is a third "next step" that has its own issue?

-Brett

>
> >
> >     What do you think?
> >
> >
> > It all looks great to me!
>
> Thanks!
>
> Yury
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160201/0a66c082/attachment.html>