From ericsnowcurrently at gmail.com  Tue Apr  1 00:26:32 2014
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 31 Mar 2014 16:26:32 -0600
Subject: [Import-SIG] making it feasible to rely on loaders for reading
 intra-package data files
In-Reply-To: <CAP1=2W73B9=rfe=Rk1Au4==wedmk8PgMfjTie_o5iD=_v5wi+g@mail.gmail.com>
References: <CAP1=2W73B9=rfe=Rk1Au4==wedmk8PgMfjTie_o5iD=_v5wi+g@mail.gmail.com>
Message-ID: <CALFfu7BUJECRpr=9ejsy0ahd-9G01QRJH8qoBw2NXCHkBcUd9Q@mail.gmail.com>

On Sat, Feb 1, 2014 at 11:44 AM, Brett Cannon <brett at python.org> wrote:
> Over on distutils-sig it came up that getting people to not simply assume
> that __file__ points to an actual file and thus avoid using open() directly
> to read intra-package files is an issue. In order to make using a loader's
> get_data reasonable (let alone set_data), there needs to be a clear
> specification of how things are expected to work and make sure that
> everything that people need is available.
>
> The docs for importlib.ResourceLoader.get_data
> (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data)
> say that things are expected to be based off of __file__, and with Python
> 3.4 using only absolute paths (except for __main__) that means all paths
> would be absolute by default. As long as people stick to pathlib/os.path and
> don't use non-standard path separators then this should just work.
>
> But what if people don't do that? I honestly say that it should either be
> explicitly undefined or that it's an IOError. IOW either we say "use
> absolute paths or else you're on your own" or "use absolute paths, period".
> That prevents having to make a decision as to whether a relative path is
> relative to the module the loader is attached to or relative to the package
> (e.g. easier for pre-module loaders or per-package loaders, respectively).
> The former is more backwards-compatible so I say the docs get updated to say
> that relative paths are undefined behaviour.

It should definitely be up to the loader associated with the module.
Some loaders, including relevant ones in importlib.machinery, are
unique to individual modules and store __file__.  In that case I'd
expect a relative path to mean relative to __file__.  Some loader
could also track __path__ as well and a relative path would be
relative to the path entries there.

However, the general API for get_/set_data() cannot rely on such
loader state without that state being part of the relevant ABCs.
Otherwise the path passed to get_/set_data() would have to be
absolute.

Furthermore, for loaders that handle non-file locations, "path" may
not be a filesystem path at all, as PJE pointed out, so a general
requirement regarding absolute/relative paths wouldn't work.  __file__
is an unfortunate name in those cases, and PEP 451 resolved this for
specs by calling it "origin" (along with has_location and
submodule_search_locations).

It may be worth adding a resolve_location() method to loaders, to
address any ambiguity.

>
> The second issue is whether get_data/set_data are enough or if something
> else is needed, e.g. a listdir-like method. Since this is meant for handling
> intra-package data my assumption is that it isn't really necessary as
> chances are you know what files you included in your distribution (or at
> least what the possible names are).

Sounds useful to me as long as the API wasn't strictly focused on
file-based resources.  It sounds like there may be other
resource-related methods worth adding at the same time.

> I know some have asked for a
> listdir-like API to help discover what modules are available so as to
> provide a plugin API, but I view that as a separate thing and potentially
> more appropriate on finders. Remember, the smaller the API service for the
> common case the better for the stdlib.

I think this would be nice, but only worth it if we anticipate a good use case.

-eric

From ericsnowcurrently at gmail.com  Tue Apr  1 00:28:28 2014
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 31 Mar 2014 16:28:28 -0600
Subject: [Import-SIG] A ModuleData API (was: Re: making it feasible to rely
 on loaders for reading intra-package data files)
Message-ID: <CALFfu7Ao8XO00JYgqyZiXYzTVxnSggikRwsDAK4_d7Wvp9cbrg@mail.gmail.com>

On Sat, Feb 1, 2014 at 11:44 AM, Brett Cannon <brett at python.org> wrote:
> Over on distutils-sig it came up that getting people to not simply assume
> that __file__ points to an actual file and thus avoid using open() directly
> to read intra-package files is an issue. In order to make using a loader's
> get_data reasonable (let alone set_data), there needs to be a clear
> specification of how things are expected to work and make sure that
> everything that people need is available.
>
> The docs for importlib.ResourceLoader.get_data
> (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data)
> say that things are expected to be based off of __file__, and with Python
> 3.4 using only absolute paths (except for __main__) that means all paths
> would be absolute by default. As long as people stick to pathlib/os.path and
> don't use non-standard path separators then this should just work.
>
> But what if people don't do that? I honestly say that it should either be
> explicitly undefined or that it's an IOError. IOW either we say "use
> absolute paths or else you're on your own" or "use absolute paths, period".
> That prevents having to make a decision as to whether a relative path is
> relative to the module the loader is attached to or relative to the package
> (e.g. easier for pre-module loaders or per-package loaders, respectively).
> The former is more backwards-compatible so I say the docs get updated to say
> that relative paths are undefined behaviour.
>
> The second issue is whether get_data/set_data are enough or if something
> else is needed, e.g. a listdir-like method. Since this is meant for handling
> intra-package data my assumption is that it isn't really necessary as
> chances are you know what files you included in your distribution (or at
> least what the possible names are). I know some have asked for a
> listdir-like API to help discover what modules are available so as to
> provide a plugin API, but I view that as a separate thing and potentially
> more appropriate on finders. Remember, the smaller the API service for the
> common case the better for the stdlib.

Here's a rough idea that helps consolidate behavior and move the focus
of the data APIs away from loaders and toward modules.

In my mind loader methods are low-level and meant particularly for
consumption by the import system.  It would be nice to have
higher-level APIs for everyone else to use.  It would make sense to
wrap that up in a class.

>From what I can tell, use cases for the data-related load API are
module-centric, so it would make sense to have the high-level API
focus on modules, rather than loaders:

class ModuleData:
    def __init__(self, module):
        self.module = module
        self.loader = module.__loader__
    def get_data(self, location):
        return self.loader.get_data(location)
    def set_data(self, location, data):
        return self.loader.set_data(location, data)
    ...

This gives us the ability to generalize standard data-related behavior
across all loaders (kind of like PEP 451 did for loading).  It would
also make customization simpler.

File-based modules/loaders are the common case.  It would be nice to
provide default implementations thereby.  I see two approaches:

* Subclass ModuleData (e.g. FileModuleData).
* Add a boolean "filebased" attr to loaders that ModuleData could use
to trigger customized behavior.

In either case, it would make sense to add a method (e.g.
get_data_api(module)) that returns a ModuleData instance, thus
allowing each loader to pick the type returned.

ModuleData.__init__ implies having the module (already imported).  The
current loader API does not require any module, so that low-level API
would still be useful if someone wanted to avoid loading the module
first.  Alternately there could be a mechanism for building a
ModuleData object from a loader without needing to load the module
first.

(I had brought up something similar with PEP 451, but it was too out
of scope to pursue there.)

-eric

From bcannon at gmail.com  Fri Apr  4 20:57:52 2014
From: bcannon at gmail.com (Brett Cannon)
Date: Fri, 4 Apr 2014 14:57:52 -0400
Subject: [Import-SIG] How best to replace imp.load_module()?
Message-ID: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>

I've been thinking about what it takes to replace imp and I realized that
imp.load_module() is the hardest to replace for two reasons. One issue is
that importlib.abc.create_module() can -- and does -- return None. This
means that if someone wanted to replicate the
imp.find_module()/imp.load_module() dance in a PEP 451 world it takes::

  spec = importlib.find_spec(name)
  try:
    module = spec.loader.create_module(spec)
  except AttributeError:
    module = None
  if module is None:
    module = types.ModuleType(spec.name)
  # No clear way to set import-related attributes.
  spec.loader.exec_module(module)

It took 6 lines to get a module. That seems a bit excessive and ripe to
either have an importlib.util function that handles this all correctly or
simply make create_module() a required method on a loader and have
importlib.abc.create_module() return types.ModuleType(spec.name).

The second annoyance is that we have not exposed
_SpecMethod.init_module_attrs() in any way. That can either come through in
importlib.util or we can make types.ModuleType grow a method that takes a
spec and then sets all the appropriate attributes. If we go with the former
we should make sure that in importlib we always prefer __spec__ over any
module-level values and that one can pass in a spec to types.ModuleType to
set __spec__ so that in the distant Python 4 future we can deprecate all
module-level attributes and just work off of __spec__.

I think if we can get these two bits cleaned up we can tell people who use
imp.load_module() directly that they can::

  # Assume proper loader chosen.
  spec = importlib.util.spec_from_loader(loader)
  module = some_newfangled_way_of_doing_this()
  somehow_init_module_attrs(module)
  loader.exec_module(module)

which isn't that bad for something they probably should be avoiding as much
as possible to begin with.

Or we take the easiest option and simply ignore all of these issues and
just say that working outside of import is not something we want to worry
about in the stdlib and let PyPI come up with some utility code that does
all of this for you if you really want it. The code maintainer in me is
liking this idea + making it easier to set __spec__ on a module through its
constructor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20140404/6cffde4a/attachment.html>

From bcannon at gmail.com  Fri Apr  4 21:07:02 2014
From: bcannon at gmail.com (Brett Cannon)
Date: Fri, 4 Apr 2014 15:07:02 -0400
Subject: [Import-SIG] How best to replace imp.load_module()?
In-Reply-To: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
References: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
Message-ID: <CAP1=2W4OwKYHwjmEkuRPfrF==vqWNsRbX0A7avs0MwrEwSMT5w@mail.gmail.com>

After writing this, I realized the reason imp existed was to work around
the limitation of import being in C. Now that import is in Python and very
well exposed, there's really nothing to say that we need importlib to grow
to some fat package that tries to solve all the issues that imp did. If we
keep importlib lean in terms of only providing things that make import
work, make providing custom importers easy, or things that are truly tough
to do right, then stuff like imp.load_module() is really outside of its
purview as that can be done without a terrible amount of effort by a
project on PyPI (beyond what I put in below all they would need to do is
copy _SpecMethods.init_module_attrs()).

I'm really starting to like the idea of not trying to contort ourselves to
replacing imp.find_module/load_module directly. I would still like to fix
types.ModuleType to do the right thing for __spec__ in its constructor and
make sure importlib does as well, but otherwise I'm happy with relying more
on the community to pick up some of the higher-level API stuff for us.


On Fri, Apr 4, 2014 at 2:57 PM, Brett Cannon <bcannon at gmail.com> wrote:

> I've been thinking about what it takes to replace imp and I realized that
> imp.load_module() is the hardest to replace for two reasons. One issue is
> that importlib.abc.create_module() can -- and does -- return None. This
> means that if someone wanted to replicate the
> imp.find_module()/imp.load_module() dance in a PEP 451 world it takes::
>
>   spec = importlib.find_spec(name)
>   try:
>     module = spec.loader.create_module(spec)
>   except AttributeError:
>     module = None
>   if module is None:
>     module = types.ModuleType(spec.name)
>   # No clear way to set import-related attributes.
>   spec.loader.exec_module(module)
>
> It took 6 lines to get a module. That seems a bit excessive and ripe to
> either have an importlib.util function that handles this all correctly or
> simply make create_module() a required method on a loader and have
> importlib.abc.create_module() return types.ModuleType(spec.name).
>
> The second annoyance is that we have not exposed
> _SpecMethod.init_module_attrs() in any way. That can either come through in
> importlib.util or we can make types.ModuleType grow a method that takes a
> spec and then sets all the appropriate attributes. If we go with the former
> we should make sure that in importlib we always prefer __spec__ over any
> module-level values and that one can pass in a spec to types.ModuleType to
> set __spec__ so that in the distant Python 4 future we can deprecate all
> module-level attributes and just work off of __spec__.
>
> I think if we can get these two bits cleaned up we can tell people who use
> imp.load_module() directly that they can::
>
>   # Assume proper loader chosen.
>   spec = importlib.util.spec_from_loader(loader)
>   module = some_newfangled_way_of_doing_this()
>   somehow_init_module_attrs(module)
>   loader.exec_module(module)
>
> which isn't that bad for something they probably should be avoiding as
> much as possible to begin with.
>
> Or we take the easiest option and simply ignore all of these issues and
> just say that working outside of import is not something we want to worry
> about in the stdlib and let PyPI come up with some utility code that does
> all of this for you if you really want it. The code maintainer in me is
> liking this idea + making it easier to set __spec__ on a module through its
> constructor.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20140404/e0f1e9e6/attachment.html>

From ncoghlan at gmail.com  Sat Apr  5 02:07:50 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 5 Apr 2014 10:07:50 +1000
Subject: [Import-SIG] How best to replace imp.load_module()?
In-Reply-To: <CAP1=2W4OwKYHwjmEkuRPfrF==vqWNsRbX0A7avs0MwrEwSMT5w@mail.gmail.com>
References: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
 <CAP1=2W4OwKYHwjmEkuRPfrF==vqWNsRbX0A7avs0MwrEwSMT5w@mail.gmail.com>
Message-ID: <CADiSq7daEdUsY19zfuVc6rG6=b8rcB4QqLyNUeXAqM+1Cdmttw@mail.gmail.com>

Keep in mind I already need a fair bit of this kind of thing to make runpy
work properly.

Moving that infrastructure to importlib.util is worthwhile, because it
makes it easier to evolve the core import implementation with confidence
that we're not breaking even obscure use cases.

The migration of extension modules to PEP 451 should take place on the road
to 3.5, and we should take a close look at migrating pdb and friends to
runpy with a view to adding -m support (which may require new features in
runpy itself).

Moving zipimport to a frozen Python module may also be desirable.

I think that's a better use case driven path to follow, and we can hold off
on finalising the imp.load_module deprecation for the time being.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20140405/6791c968/attachment.html>

From bcannon at gmail.com  Sat Apr  5 15:58:23 2014
From: bcannon at gmail.com (Brett Cannon)
Date: Sat, 5 Apr 2014 09:58:23 -0400
Subject: [Import-SIG] How best to replace imp.load_module()?
In-Reply-To: <CADiSq7daEdUsY19zfuVc6rG6=b8rcB4QqLyNUeXAqM+1Cdmttw@mail.gmail.com>
References: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
 <CAP1=2W4OwKYHwjmEkuRPfrF==vqWNsRbX0A7avs0MwrEwSMT5w@mail.gmail.com>
 <CADiSq7daEdUsY19zfuVc6rG6=b8rcB4QqLyNUeXAqM+1Cdmttw@mail.gmail.com>
Message-ID: <CAP1=2W6DkK44SK_iJDWnB+Db4+U2tPW9PnVqGwN_Y9NAC2iBBQ@mail.gmail.com>

On Fri, Apr 4, 2014 at 8:07 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Keep in mind I already need a fair bit of this kind of thing to make runpy
> work properly.
>

OK, that's good to know.


> Moving that infrastructure to importlib.util is worthwhile, because it
> makes it easier to evolve the core import implementation with confidence
> that we're not breaking even obscure use cases.
>
> The migration of extension modules to PEP 451 should take place on the
> road to 3.5, and we should take a close look at migrating pdb and friends
> to runpy with a view to adding -m support (which may require new features
> in runpy itself).
>

SGTM


> Moving zipimport to a frozen Python module may also be desirable.
>

I think the amount of dependencies might make this more of a pain than it's
worth. I asked Greg and Thomas if they thought it might be worth it after
all the headaches they went through for zipimport and they didn't think it
necessarily worth it. While I would be quite happy if someone actually
tried to figure out the feasibility (maybe running zipfile through
modulefinder is enough to get an idea?), I just don't know if the level of
dependency will be so high that it will just get annoying short of freezing
the entire stdlib (which in and of itself might be an interesting exercise,
although I would see some flipping out over the increased binary size).


> I think that's a better use case driven path to follow, and we can hold
> off on finalising the imp.load_module deprecation for the time being.
>

Well, it's been explicitly deprecated since Python 3.3 (3.3 had a
DeprecationWarning in the function, 3.4 has it implicitly through the
module-level deprecation). But actual removal won't happen until we do a
deprecation spring cleaning in the stdlib (e.g. Python 4 kind of thing).

Anyway, I'll wait until you're ready to work on runpy stuff to worry about
what exactly we want to support so as to not go stabbing in the dark as
trying to get average use cases has been hard to come by (GitHub actually
suggests very few people use load_module() w/o find_module() which makes
this easier to deal with).

-Brett


> Cheers,
> Nick.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20140405/4ad3053b/attachment.html>

From ericsnowcurrently at gmail.com  Sat Apr  5 23:45:02 2014
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 5 Apr 2014 15:45:02 -0600
Subject: [Import-SIG] How best to replace imp.load_module()?
In-Reply-To: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
References: <CAP1=2W490mB00Z8RTYn4DTaoZ9ozrtfYx6kUt8yVhY4_gFKbLw@mail.gmail.com>
Message-ID: <CALFfu7CZ40GCQH84sEVqsS-Suzq1F6ifQVr7vkCUSCgXFUnUDQ@mail.gmail.com>

On Apr 4, 2014 12:58 PM, "Brett Cannon" <bcannon at gmail.com> wrote:
>
> I've been thinking about what it takes to replace imp and I realized that
imp.load_module() is the hardest to replace for two reasons. One issue is
that importlib.abc.create_module() can -- and does -- return None. This
means that if someone wanted to replicate the
imp.find_module()/imp.load_module() dance in a PEP 451 world it takes::
>
>   spec = importlib.find_spec(name)
>   try:
>     module = spec.loader.create_module(spec)
>   except AttributeError:
>     module = None
>   if module is None:
>     module = types.ModuleType(spec.name)
>   # No clear way to set import-related attributes.
>   spec.loader.exec_module(module)
>
> It took 6 lines to get a module. That seems a bit excessive and ripe to
either have an importlib.util function that handles this all correctly or
simply make create_module() a required method on a loader and have
importlib.abc.create_module() return types.ModuleType(spec.name).

I agree with your later email about not needing to add a ton of API
unnecessarily.  I'm not sure imp.load_module() needs to live on in
importlib.

The key one I'd like to see is a replacement for direct calls to
loader.load_module().  We do this a bunch in the stdlib and each of those
places currently has to do a dance around _SpecMethods.  Doing so outside
the stdlib isn't really correct.  In both cases it would be nice to wrap
that in a util function (like "import_from_loader()") or a classmethod on
Loader.

>
> The second annoyance is that we have not exposed
_SpecMethod.init_module_attrs() in any way.

If we had an import_from_loader(), I'm not sure we'd need to worry about
it.  Would there be other use cases for setting those attrs?

Also, once I've wrapped up (either way) the OrderedDict-related stuff I'm
working on, my main goal is to propose a successor to PEP 406
(ImportEngine).  That would including exposing most of the _SpecMethods API
in some form.

FWIW, I'm still uncomfortable with exposing that API directly on
ModuleSpec, but would like to see it exposed publicly in some indirect way.

> That can either come through in importlib.util or we can make
types.ModuleType grow a method that takes a spec and then sets all the
appropriate attributes.

Maybe if it were a class-only method.  I'd hate to see something like that
exposed on module objects.

> If we go with the former we should make sure that in importlib we always
prefer __spec__ over any module-level values and that one can pass in a
spec to types.ModuleType to set __spec__ so that in the distant Python 4
future we can deprecate all module-level attributes and just work off of
__spec__.

This is tricky.  It depends on how useful it is to people to have module
attrs that vary from the spec (for which the module was originally loaded).

That aside, I've found it's still useful to have __name__ and __file__
rather than having to look them up on __spec__.  Maybe that's just because
I'm not used to it. :)  That would be a different matter if the common use
cases for the two were satisfied by other means.  It would be nice if we
could restrict __file__ to just modules that are FS-based (and drop it or
set it to None for all other modules).  The others attrs are probably used
uncommonly enough that they could get dropped from module objects.

>
> I think if we can get these two bits cleaned up we can tell people who
use imp.load_module() directly that they can::
>
>   # Assume proper loader chosen.
>   spec = importlib.util.spec_from_loader(loader)
>   module = some_newfangled_way_of_doing_this()
>   somehow_init_module_attrs(module)
>   loader.exec_module(module)

That's basically what import_from_loader() would do.

>
> which isn't that bad for something they probably should be avoiding as
much as possible to begin with.

Maybe it would be worth getting a clear idea of why people use
imp.load_module() and loader.load_module() (directly).  My guess is that it
is to accomplish slight deviations from normal import behavior.  For
example, that last timed I looked at the source, Salt used
imp.load_module() to do some trickery.  However, I expect that most cases
would be satisfied by use of proper importers.  Either way, it would be
nice to have a better picture of what unusual things people are doing like
this.  I'd benefit from that at least. :)

>
> Or we take the easiest option and simply ignore all of these issues and
just say that working outside of import is not something we want to worry
about in the stdlib and let PyPI come up with some utility code that does
all of this for you if you really want it. The code maintainer in me is
liking this idea + making it easier to set __spec__ on a module through its
constructor.

:)

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20140405/eba1fe0f/attachment.html>