[Import-SIG] making it feasible to rely on loaders for reading intra-package data files

Eric Snow ericsnowcurrently at gmail.com
Tue Apr 1 00:26:32 CEST 2014


On Sat, Feb 1, 2014 at 11:44 AM, Brett Cannon <brett at python.org> wrote:
> Over on distutils-sig it came up that getting people to not simply assume
> that __file__ points to an actual file and thus avoid using open() directly
> to read intra-package files is an issue. In order to make using a loader's
> get_data reasonable (let alone set_data), there needs to be a clear
> specification of how things are expected to work and make sure that
> everything that people need is available.
>
> The docs for importlib.ResourceLoader.get_data
> (http://docs.python.org/3.4/library/importlib.html#importlib.abc.ResourceLoader.get_data)
> say that things are expected to be based off of __file__, and with Python
> 3.4 using only absolute paths (except for __main__) that means all paths
> would be absolute by default. As long as people stick to pathlib/os.path and
> don't use non-standard path separators then this should just work.
>
> But what if people don't do that? I honestly say that it should either be
> explicitly undefined or that it's an IOError. IOW either we say "use
> absolute paths or else you're on your own" or "use absolute paths, period".
> That prevents having to make a decision as to whether a relative path is
> relative to the module the loader is attached to or relative to the package
> (e.g. easier for pre-module loaders or per-package loaders, respectively).
> The former is more backwards-compatible so I say the docs get updated to say
> that relative paths are undefined behaviour.

It should definitely be up to the loader associated with the module.
Some loaders, including relevant ones in importlib.machinery, are
unique to individual modules and store __file__.  In that case I'd
expect a relative path to mean relative to __file__.  Some loader
could also track __path__ as well and a relative path would be
relative to the path entries there.

However, the general API for get_/set_data() cannot rely on such
loader state without that state being part of the relevant ABCs.
Otherwise the path passed to get_/set_data() would have to be
absolute.

Furthermore, for loaders that handle non-file locations, "path" may
not be a filesystem path at all, as PJE pointed out, so a general
requirement regarding absolute/relative paths wouldn't work.  __file__
is an unfortunate name in those cases, and PEP 451 resolved this for
specs by calling it "origin" (along with has_location and
submodule_search_locations).

It may be worth adding a resolve_location() method to loaders, to
address any ambiguity.

>
> The second issue is whether get_data/set_data are enough or if something
> else is needed, e.g. a listdir-like method. Since this is meant for handling
> intra-package data my assumption is that it isn't really necessary as
> chances are you know what files you included in your distribution (or at
> least what the possible names are).

Sounds useful to me as long as the API wasn't strictly focused on
file-based resources.  It sounds like there may be other
resource-related methods worth adding at the same time.

> I know some have asked for a
> listdir-like API to help discover what modules are available so as to
> provide a plugin API, but I view that as a separate thing and potentially
> more appropriate on finders. Remember, the smaller the API service for the
> common case the better for the stdlib.

I think this would be nice, but only worth it if we anticipate a good use case.

-eric


More information about the Import-SIG mailing list