[Python-ideas] Make `import a.b.c as m` is equivalent to `m = sys.modules['a.b.c']`

Nick Coghlan ncoghlan at gmail.com
Sat Apr 8 02:47:34 EDT 2017


On 8 April 2017 at 15:54, Victor Varvariuc <victor.varvariuc at gmail.com> wrote:
> Hi there.
>
> I asked a question on Stackoverflow:
>
> (Pdb) import brain.utils.mail
>
> (Pdb) import brain.utils.mail as mail_utils
>
> *** AttributeError: module 'brain.utils' has no attribute 'mail'
>
> I always thought that import a.b.c as m is roughly equivalent to m =
> sys.modules['a.b.c']. Why AttributeError? Python 3.6
>
> I was pointed out that this is a somewhat weird behavior of Python:
>
> The statement is not quite true, as evidenced by the corner case you met,
> namely if the required modules already exist in sys.modules but are yet
> uninitialized. The import ... as requires that the module foo.bar is
> injected in foo namespace as the attribute bar, in addition to being in
> sys.modules, whereas the from ... import ... as looks for foo.bar in
> sys.modules.
>
>
> Why would `import a.b.c` work when `a.b.c` is not yet fully imported, but
> `import a.b.c as my_c` would not? I though it would be vice versa.

It as to do with when and how the attribute lookup for the submodule
occurs. Disassembling the 3 examples (plus another variant):

    >>> import dis
    >>> dis.dis("import a.b.c")
     1          0 LOAD_CONST               0 (0)
                 2 LOAD_CONST               1 (None)
                 4 IMPORT_NAME              0 (a.b.c)
                 6 STORE_NAME               1 (a)
                 8 LOAD_CONST               1 (None)
                10 RETURN_VALUE

In the first case, we see that the name bound in the local namespace
is *a*: we won't actually try to access the full "a.b.c" attribute
reference until somewhere later in the module.

That lazy lookup means the import system is satisfied based on the
sys.modules entry, and the module code will be happy as long as "a.b"
has a "c" attribute by the time the module needs it.

    >>> dis.dis("import a.b.c as m")
      1         0 LOAD_CONST               0 (0)
                 2 LOAD_CONST               1 (None)
                 4 IMPORT_NAME              0 (a.b.c)
                 6 LOAD_ATTR                1 (b)
                 8 LOAD_ATTR                2 (c)
               10 STORE_NAME               3 (m)
               12 LOAD_CONST               1 (None)
               14 RETURN_VALUE

In the second case, we see that "import a.b.c as m" produces almost
exactly the same bytecode as "import a.b.c; m = a.b.c", skipping only
the store-and-reload of the "a" reference:

    >>> dis.dis("import a.b.c; m = a.b.c")
     1          0 LOAD_CONST               0 (0)
                 2 LOAD_CONST               1 (None)
                 4 IMPORT_NAME              0 (a.b.c)
                 6 STORE_NAME               1 (a)
                 8 LOAD_NAME                1 (a)
               10 LOAD_ATTR                2 (b)
               12 LOAD_ATTR                3 (c)
               14 STORE_NAME               4 (m)
               16 LOAD_CONST               1 (None)
               18 RETURN_VALUE

That eager lookup means that "a.b" *must* have a "c" attribute at the
time of the import, or the LOAD_ATTR operation will fail.

    >>> dis.dis("from a.b import c as m")
      1         0 LOAD_CONST               0 (0)
                 2 LOAD_CONST               1 (('c',))
                 4 IMPORT_NAME              0 (a.b)
                 6 IMPORT_FROM              1 (c)
                 8 STORE_NAME               2 (m)
               10 POP_TOP
               12 LOAD_CONST               2 (None)
               14 RETURN_VALUE

Finally, when we come to the "from" import case, we can see that the
bytecode changes significantly: the LOAD_ATTRs all disappear, replaced
by a single IMPORT_FROM, and the from_list for the IMPORT_NAME opcode
is a populated tuple, rather than None.

To address the next question that follows on from looking at the
current compile time behaviour, changing those semantics isn't as
simple as just having the compiler reinterpret "import x.y.z as m" as
"from x.y import z as m". The problem is that the two formulations
make different assertions about the nature of "z" (in the first form,
it's expected to be a submodule, in the second, it can be any
attribute of x.y), so adding such a reinterpretation would either move
the discrepancy to having "import x.y.z as m" work in cases where
"import x.y.z" fails, or else require adding a way for the compiler to
indicate that the resolved attribute must *also* exist in sys.modules,
in addition to being an attribute on the parent module.

That's not impossible to resolve, but it would be a lot of work just
to add an alternate spelling of "from a.b import c as m".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list