[Python-Dev] Tweak to PEP 523 for storing a tuple in co_extra

Sat Sep 3 20:19:44 EDT 2016

On Sat, 3 Sep 2016 at 16:43 Yury Selivanov <yselivanov.ml at gmail.com> wrote:

>
>
> On 2016-09-03 4:15 PM, Christian Heimes wrote:
> > On 2016-09-04 00:03, Yury Selivanov wrote:
> >>
> >> On 2016-09-03 12:27 PM, Brett Cannon wrote:
> >>> Below is the `co_extra` section of PEP 523 with the update saying that
> >>> users are expected to put a tuple in the field for easier simultaneous
> >>> use of the field.
> >>>
> >>> Since the `co_extra` discussions do not affect CPython itself I'm
> >>> planning on landing the changes stemming from the PEP probably on
> Monday.
> >> Tuples are immutable.  If you have multiple co_extra users then they
> >> will have to either mutate tuple (which isn't always possible, for
> >> instance, you can't increase size), or to replace it with another tuple.
> >>
> >> Creating lists is a bit more expensive, but item access speed should be
> >> in the same ballpark.
> >>
> >> Another question -- sorry if this was discussed before -- why do we want
> >> a PyObject* there at all?  I.e. why don't we create a dedicated struct
> >> CoExtraContainer to manage the stuff in co_extra?  My understanding is
> >> that the users of co_extra are C-level python optimizers and profilers,
> >> which don't need the overhead of CPython API.
>

As Chris pointed out in another email, the overhead is only in the
allocation, not the iteration/access if you use the PyTuple macros to get
the size and index into the tuple the overhead is negligible.

> >>
> >> This way my work to add an extra caching layer (which I'm very much
> >> willing to continue to work on) wouldn't require another set of extra
> >> fields for code objects.
> > Quick idea before I go to bed:
> >
> > You could adopt a similar API to OpenSSL's CRYPTO_get_ex_new_index()
> > API,
> >
> https://www.openssl.org/docs/manmaster/crypto/CRYPTO_get_ex_new_index.html
> >
> >
> > static int code_index = 0;
> >
> > int PyCodeObject_NewIndex() {
> >      return code_index++;
> > }
> >
> > A library like Pyjion has to acquire an index first. In further calls it
> > uses the index as offset into the new co_extra field. Libraries don't
> > have to hard-code their offset and two libraries will never conflict.
> > PyCode_New() can pre-populate co_extra with a PyTuple of size
> > code_index. This avoids most resizes if you load Pyjion early. For
> > code_index == 0 leaf the field NULL.
>
> Sounds like a very good idea!
>

The problem with this is the pre-population. If you don't get your index
assigned before the very first code object is allocated then you still have
to manage the size of the tuple in co_extra. So what this would do is avoid
the iteration but not the allocation overhead.

If we open up the can of worms in terms of custom functions for this (which
I was trying to avoid), then you end up with Py_ssize_t
_PyCode_ExtraIndex(), PyObject *
  _PyCode_GetExtra(PyCodeObject *code, Py_ssize_t index), and int
_PyCode_SetExtra(PyCodeObject *code, Py_ssize_t index, PyObject *data)
which does all the right things for creating or resizing the tuple as
necessary and which I think matches mostly what Nick had proposed earlier.
But the pseudo-code for _PyCode_GetExtra() would be::

  if co_extra is None:
    co_extra = (None,) * _next_extra_index;
    return None
  elif len(co_extra) < index - 1:
    ... pad out tuple
    return None
   else:
     return co_extra[index]

Is that going to save us enough to want to have a custom API for this?

-Brett

>
> Yury
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160904/5135f1f0/attachment.html>