[Python-ideas] Move optional data out of pyc files

Daniel Moisset dmoisset at machinalis.com
Thu Apr 12 10:16:31 EDT 2018


One implementation difficulty specifically related to annotations, is that
they are quite hard to find/extract from the code objects. Both docstrings
and lnotab are within specific fields of the code object for their
function/class/module; annotations are spread as individual constants
(assuming PEP 563), which are loaded in bytecode through separate
LOAD_CONST statements before creating the function object, and that can
happen in the middle of bytecode for the higher level object (the module or
class containing a function definition). So the change for achieving that
will be more significant than just "add a couple of descriptors to function
objects and change the module marshalling code".

Probably making annotations fit a single structure that can live in
co_consts could make this change easier, and also make startup of annotated
modules faster (because you just load a single constant instead of one per
argument), this might be a valuable change by itself.



On 12 April 2018 at 11:48, INADA Naoki <songofacandy at gmail.com> wrote:

> > Finally, loading docstrings and other optional components can be made
> lazy.
> > This was not in my original idea, and this will significantly complicate
> the
> > implementation, but in principle it is possible. This will require larger
> > changes in the marshal format and bytecode.
>
> I'm +1 on this idea.
>
> * New pyc format has code section (same to current) and text section.
> text section stores UTF-8 strings and not loaded at import time.
> * Function annotation (only when PEP 563 is used) and docstring are
> stored as integer, point to offset in the text section.
> * When type.__doc__, PyFunction.__doc__, PyFunction.__annotation__ are
> integer, text is loaded from the text section lazily.
>
> PEP 563 will reduce some startup time, but __annotation__ is still
> dict.  Memory overhead is negligible.
>
> In [1]: def foo(a: int, b: int) -> int:
>    ...:     return a + b
>    ...:
>    ...:
>
> In [2]: import sys
> In [3]: sys.getsizeof(foo)
> Out[3]: 136
>
> In [4]: sys.getsizeof(foo.__annotations__)
> Out[4]: 240
>
> When PEP 563 is used, there are no side effect while building the
> annotation.
> So the annotation can be serialized in text, like
> {"a":"int","b":"int","return":"int"}.
>
> This change will require new pyc format, and descriptor for
> PyFunction.__doc__, PyFunction.__annotation__
> and type.__doc__.
>
> Regards,
>
> --
> INADA Naoki  <songofacandy at gmail.com>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Daniel F. Moisset - UK Country Manager - Machinalis Limited
www.machinalis.co.uk <http://www.machinalis.com>
Skype: @dmoisset T: + 44 7398 827139

1 Fore St, London, EC2Y 9DT

Machinalis Limited is a company registered in England and Wales. Registered
number: 10574987.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180412/95c4e13b/attachment-0001.html>


More information about the Python-ideas mailing list