[Python-Dev] Opcode cache in ceval loop
Brett Cannon
brett at python.org
Mon Feb 1 15:08:16 EST 2016
On Mon, 1 Feb 2016 at 11:51 Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> Hi Brett,
>
> On 2016-02-01 2:30 PM, Brett Cannon wrote:
> >
> >
> > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov <yselivanov.ml at gmail.com
> > <mailto:yselivanov.ml at gmail.com>> wrote:
> >
> > Hi,
> >
> [..]
> >
> > What's next?
> > ------------
> >
> > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
> > [1]. It's a very straightforward optimization, the patch is small
> and
> > easy to review.
> >
> >
> > +1 from me.
> >
> >
> > Second, I'd like to merge the new opcode cache, see issue 26219 [5].
> > All unittests pass. Memory usage increase is very moderate (<1mb for
> > the entire test suite), and the performance increase is significant.
> > The only potential blocker for this is PEP 509 approval (which I'd be
> > happy to assist with).
> >
> >
> > I think the fact that it improves performance across the board as well
> > as eliminates the various tricks people use to cache global and
> > built-ins, a big +1 from me. I guess that means Victor needs to ask
> > for pronouncement on PEP 509.
>
> Great! AFAIK Victor still needs to update the PEP with some changes
> (globally unique ma_version). My patch includes the latest
> implementation of PEP 509, and it works fine (no regressions, no broken
> unittests). I can also assist with reviewing Victor's implementation if
> the PEP is accepted.
>
> >
> > BTW, where does LOAD_ATTR fit into all of this?
>
> LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I
> understand you question correctly). It's based on the following
> assumptions (that really make JITs work so well):
>
> 1. Most classes don't implement __getattribute__.
>
> 2. A lot of attributes are stored in objects' __dict__s.
>
> 3. Most attributes aren't shaded by descriptors/getters-setters; most
> code just uses "self.attr".
>
> 4. An average method/function works on objects of the same type. Which
> means that those objects were constructed in a very similar (if not
> exact) fashion.
>
> For instance:
>
> class F:
> def __init__(self, name):
> self.name = name
> def say(self):
> print(self.name) # <- For all F instances,
> # offset of 'name' in `F().__dict__`s
> # will be the same
>
> If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets
> deoptimized, and the default implementation is used. So if the code is
> very dynamic - there's no improvement, but no performance penalty either.
>
> In my patch, I use the cache to store (for LOAD_ATTR specifically):
>
> - pointer to object's type
> - type->tp_version_tag
> - the last successful __dict__ offset
>
> The first two fields are used to make sure that we have objects of the
> same type. If it changes, we deoptimize the opcode immediately. Then
> we try the offset. If it's successful - we have a cache hit. If not,
> that's fine, we'll try another few times before deoptimizing the opcode.
>
So this is a third "next step" that has its own issue?
-Brett
>
> >
> > What do you think?
> >
> >
> > It all looks great to me!
>
> Thanks!
>
> Yury
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160201/0a66c082/attachment.html>
More information about the Python-Dev
mailing list