[Python-ideas] LOAD_NAME/LOAD_GLOBAL should be use getattr()

INADA Naoki songofacandy at gmail.com
Tue Sep 12 23:24:31 EDT 2017


I'm worring about performance much.

Dict has ma_version from Python 3.6 to be used for future optimization
including global caching.
Adding more abstraction layer may make it difficult.


When considering lazy loading, big problem is backward compatibility.
For example, see
https://github.com/python/cpython/blob/master/Lib/concurrent/futures/__init__.py

from concurrent.futures._base import (FIRST_COMPLETED,
                                      FIRST_EXCEPTION,
                                      ALL_COMPLETED,
                                      CancelledError,
                                      TimeoutError,
                                      Future,
                                      Executor,
                                      wait,
                                      as_completed)
from concurrent.futures.process import ProcessPoolExecutor
from concurrent.futures.thread import ThreadPoolExecutor


Asyncio must import concurrent.futures.Future because compatibility between
asyncio.Future and concurrent.futures.Future.

But not all asyncio applications need ProcessPoolExecutor.
Thay may use only ThreadPoolExecutor.

Currently, they are forced to import concurrent.futures.process, and it imports
multiprocessing.  It makes large import dependency tree.

To solve such problem, hooking LOAD_GLOBAL is not necessary.

# in concurrent/futures/__init__.py

def __getattr__(name):
    if name == 'ProcessPoolExecutor':
        global ProcessPoolExecutor
        from .process import ProcessPoolExecutor
        return ProcessPoolExecutor

# Following code should call __getattr__

from concurrent.futures import ProcessPoolExecutor  # eager loading

import concurrent.futures as futures
executor = futures.ProcessPoolExecutor()  # lazy loading


On the other hand, lazy loading global is easier than above.
For example, linecache imports tokenize and tokenize is relatively heavy.
https://github.com/python/cpython/blob/master/Lib/linecache.py#L11

tokenize is used from only one place (in linecache.updatecache()).
So lazy importing it is just moving `import tokenize` into the function.

    try:
        import tokenize
        with tokenize.open(fullname) as fp:
            lines = fp.readlines()

I want to lazy load only for heavy and rarely used module
Lazy loading many module may make execution order unpredictable.
So manual lazy loading technique is almost enough to me.


Then, what is real world requirement about abstraction layer to LOAD_GLOBAL?

Regards,
INADA Naoki  <songofacandy at gmail.com>


On Wed, Sep 13, 2017 at 1:17 AM, Neil Schemenauer
<nas-python-ideas at arctrix.com> wrote:
> This is my idea of making module properties work.  It is necessary
> for various lazy-loading module ideas and it cleans up the language
> IMHO.  I think it may be possible to do it with minimal backwards
> compatibility problems and performance regression.
>
> To me, the main issue with module properties (or module __getattr__)
> is that you introduce another level of indirection on global
> variable access.  Anywhere the module.__dict__ is used as the
> globals for code execution, changing LOAD_NAME/LOAD_GLOBAL to have
> another level of indirection is necessary.  That seems inescapable.
>
> Introducing another special feature of modules to make this work is
> not the solution, IMHO.  We should make module namespaces be more
> like instance namespaces.  We already have a mechanism and it is
> getattr on objects.
>
> I have a very early prototype of this idea.  See:
>
>     https://github.com/nascheme/cpython/tree/exec_mod
>
> Issues to be resolved:
>
> - __namespace__ entry in the __dict__ creates a reference cycle.
>   Maybe could use a weakref somehow to avoid it.  Maybe we just
>   explicitly break it.
>
> - getattr() on the module may return things that LOAD_NAME and
>   LOAD_GLOBAL don't expect (e.g. things from the module type).  I
>   need to investigate that.
>
> - Need to fix STORE_* opcodes to do setattr() rather than
>   __setitem__.
>
> - Need to optimize the implementation.  Maybe the module instance
>   can know if any properties or __getattr__ are defined.  If no,
>   have __getattribute__ grab the variable directly from md_dict.
>
> - Need to fix eval() to allow module as well as dict.
>
> - Need to change logic where global dict is passed around.  Pass the
>   module instead so we don't have to keep retrieving __namespace__.
>   For backwards compatibility, need to keep functions that take
>   'globals' as dict and use PyModule_GetDict() on public APIs that
>   return globals as a dict.
>
> - interp->builtins should be a module, not a dict.
>
> - module shutdown procedure needs to be investigated and fixed.  I
>   think it may get simpler.
>
> - importlib needs to be fixed to pass modules to exec() and not
>   dicts.  From my initial experiments, it looks like importlib gets
>   a lot simpler.  Right now we pass around dicts in a lot of places
>   and then have to grub around in sys.modules to get the module
>   object, which is what importlib usually wants.
>
> I have requested help in writing a PEP for this idea but so far no
> one is foolish enough to join my crazy endeavor. ;-)
>
> Regards,
>
>   Neil
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


More information about the Python-ideas mailing list