[Python-ideas] Implicit submodule imports

Thu Sep 25 07:07:29 CEST 2014

On Wed, Sep 24, 2014 at 7:10 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Agreed, it's a nice feature :-)
>
> I've been using this in our mx packages since 1999 using a module
> called LazyModule.py. See e.g.
> http://educommons.com/dev/browser/3.2/installers/windows/src/eduCommons/python/Lib/site-packages/mx/URL/LazyModule.py
>
> Regarding making module more class like: we've played with this
> a bit at PyCon UK and it's really easy to turn a module into a
> regular class (with all its features) by tweaking sys.modules -
> we even got .__getattr__() to work. With some more effort, we
> could have a main() function automatically called upon direct
> import from the command line.
>
> The whole thing is a huge hack, though, so I'll leave out the
> details :-)

Indeed. I can think of multiple places where there are compelling
reasons to want to hook module attribute lookup:

Lazy loading: as per above. E.g., ten years ago for whatever reason,
someone decided that 'import numpy' ought to automatically execute
'import numpy.testing' as well. So now backcompat means we're stuck
with it. 'import numpy.testing' is rather slow, to the point that it
can be a substantial part of the total overhead for launching
numpy-using scripts. We get bug reports about this, from people who
are irritated that their production code is spending all this time
loading unit-test harnesses and whatnot that it doesn't even use.

Module attribute deprecation: For reasons that are even more lost in
the mists of time, numpy re-exports some objects from the __builtins__
namespace (e.g., numpy.float exists but is __builtins__.float; if you
want the default numpy floating-point type you have to write
numpy.float_). As you can probably imagine this is massively confusing
to everyone, but if we just removed these re-exports then it would
break existing working code (e.g., 'numpy.array([1, 2, 3],
dtype=numpy.float)' does work and do the right thing right now), so
according to our deprecation policy we have to spend a few releases
issuing warnings every time someone writes 'numpy.float'. Which
requires executing arbitrary code at attribute lookup time.

I think both of these use cases arise very commonly in long-lived
projects, but right now the only ways to accomplish either of these
things involve massive disgusting hacks. They are really really hard
to do cleanly, and you risk all kinds of breakage in edge-cases (e.g.
try reload()'ing a module that's been replaced by an object). So, we
haven't dared release anything like this in production, and the above
problems just hang around indefinitely.

What I'd really like is for module attribute lookup to start
supporting the descriptor protocol. This would be super-easy to work
with and fast (you only pay the extra overhead for the attributes
which have been hooked).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org