[Import-SIG] Loading Resources From a Python Module/Package

Brett Cannon brett at python.org
Sat Jan 31 18:00:52 CET 2015


On Sat Jan 31 2015 at 11:43:55 AM Donald Stufft <donald at stufft.io> wrote:

> On Jan 31, 2015, at 11:31 AM, Brett Cannon <brett at python.org> wrote:
>
>
>
> On Sat Jan 31 2015 at 10:54:22 AM Paul Moore <p.f.moore at gmail.com> wrote:
>
>> On 31 January 2015 at 15:47, Donald Stufft <donald at stufft.io> wrote:
>> >> It's certainly possible to add a new API that loads resources based on
>> >> a relative name, but you'd have to specify relative to *what*.
>> >> get_data explicitly ducks out of making that decision.
>> >
>> > data = __loader__.get_bytes(__name__, “logo.gif”)
>>
>> Quite possibly. It needs a bit of fleshing out to make sure it doesn't
>> prohibit sharing of loaders, etc, in the way Brett mentions.
>
>
> By specifying the package anchor point I don't think it does.
>
>
>> Also, the
>> fact that it needs __name__ in there feels wrong - a bit like the old
>> version of super() needing to be told which class it was being called
>> from.
>
>
> You can't avoid that. This is the entire reason why loader reuse is a
> pain; you **have** to specify what to work off of, else its ambiguous and a
> specific feature of a specific loader.
>
> But this is only an issue when you are trying to access a file relative to
> the package/module you're in. Otherwise you're going to be specifying a
> string constant like 'foo.bar'.
>
>
>> But in principle I don't object to finding a suitable form of
>> this.
>>
>> And I like the name get_bytes - much more explicit in these Python 3
>> days of explicit str/bytes distinctions :-)
>
>
> One unfortunate side-effect from having a new method to return bytes from
> a data file is that it makes get_data() somewhat redundant. If we make it
> get_data_filename(package_name, path) then it can return an absolute path
> which can then be passed to get_data() to read the actual bytes. If we
> create importlib.resources as Donald has suggested then all of this can be
> hidden  behind a function and users don't have to care about any of this,
> e.g. importlib.resources.read_data(module_anchor, path).
>
>
> I think we actually have to go the other way, because only some Loaders
> will be able to actually return a filename (returning a filename is
> basically an optimization to prevent needing to call get_data and write
> that out to a temporary directory) but pretty much any loader should
> theoretically be able to support get_data.
>

Why can only some loaders return a filename? As I have said, loaders can
return an opaque string to simulate a path if necessary.


>
> I think it is redundant but given that it’s a new API (passing module and
> a “resource path”) I think it makes sense. The old get_data API can be
> deprecated but left in for compatibility reasons if we want (sort of like
> Loader().load_module() -> Loader().exec_module()).
>

If we do that then there would have to be a way to specify how to read the
bytes for the module code itself since get_data() is used in the
implementation of import by coupling it with get_filename() (which is why
I'm trying not have to drop get_filename()/get_data() and instead come up
with some new approach to reading bytes since the current approach is very
composable). So get_bytes() would need a way to signal that you don't want
some data file but the bytes for the module. Maybe if the path section is
unspecified then that's a signal that the module's bytes is wanted and not
some data file?


>
>
> One thing to consider is do we want to allow anything other than filenames
> for the path part? Thanks to namespace packages every directory is
> essentially a package, so we could say that the package anchor has to
> encapsulate the directory and the path bit can only be a filename. That
> gets us even farther away from having the concept of file paths being
> manipulated in relation to import-related APIs.
>
>
> I think we do want to allow directories, it’s not unusual to have
> something like:
>
> warehouse
> ├── __init__.py
> ├── templates
> │   ├── accounts
> │   │   └── profile.html
> │   └── hello.html
> ├── utils
> │   └── mapper.py
> └── wsgi.py
>
> Conceptually templates isn’t a package (even though with namespace
> packages it kinda is) and I’d want to load profile.html by doing something
> like:
>
> importlib.resources.get_bytes(“warehouse”,
> “templates/accounts/profile.html”)
>

Where I would be fine with get_bytes('warehouse.templates.accounts',
'profile.html')  =)


>
> In pkg_resources the second argument to that function is a “resource path”
> which is defined as a relative to the given module/package and it must use
> / to denote them. It explicitly says it’s not a file system path but a
> resource path. It may translate to a file system path (as is the case with
> the FileLoader) but it also may not (as is the case with a theoretical
> S3Loader or PostgreSQLLoader).
>

Yep, which is why I'm making sure if we have paths we minimize them as they
instantly make these alternative loader concepts a bigger pain to implement.


> How you turn a warehouse + a resource path into some data (or whatever
> other function we support) is an implementation detail of the Loader.
>
>
> And just so I don't forget it, I keep wanting to pass an actual module in
> so the code can extract the name that way, but that prevents the __name__
> trick as you would have to import yourself or grab the module from
> sys.modules.
>
>
> Is an actual module what gets passed into Loader().exec_module()?
>

Yes.


> If so I think it’s fine to pass that into the new Loader() functions and a
> new top level API in importlib.resources can do the things needed to turn a
> string into a module object. So instead of doing
> __loader__.get_bytes(__name__, “logo.gif”) you’d do
> importlib.resources.get_bytes(__name__, “logo.gif”).
>

If we go the route of importlib.resources then that seems like a reasonable
idea, although we will need to think through the ramifications to
exec_module() itself although I don't think there were be any issues.

And if we do go with importlib.resources I will probably want to make it
available on PyPI with appropriate imp/pkgutil fallbacks to help people
transitioning from Python 2 to 3.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150131/58cb6c63/attachment-0001.html>


More information about the Import-SIG mailing list