[Import-SIG] Loading Resources From a Python Module/Package

Barry Warsaw barry at python.org
Sat Jan 31 18:31:46 CET 2015


On Jan 30, 2015, at 06:37 PM, Donald Stufft wrote:

>I think it would be a good idea to implement a pkgutil.get_data_filename
>function which would return a filename that can be accessed to get at that
>particular bit of package data.

+1

Of the pkg_resource methods that I use all the time, resource_string() (which
in Python 3 should really be called resource_bytes()) and resource_filename()
are the overwhelming favorites.  I do occasionally use resource_stream() and
even more rarely, resource_listdir().

Given that pkgutil.get_data() is essentially resource_bytes(), adopting (and
improving) equivalents for resource_filename() and resource_stream() would be
really nice.

>I have a few concerns however, currently Loader.get_data() requires you to
>pass the entire path of the file you want to open (like
>/usr/lib/python3.5/site-packages/foo/bar.txt or /data/foo.zip/bar.txt)
>however I've made Loader.get_data_filename() want a relative path (like
>bar.txt).
>
>I wonder if this difference is OK?

Depends on who you ask :).  Clearly, most users should never be confronted
with the difference.  The APIs they should use are the pkgutil ones and there,
everything's relative to a package namespace path, which is (well, modulo
perhaps some PEP 420 corners) unambiguous.

I don't particularly like the "feature" of get_data() allowing resources paths
with / in the name.  I'd much rather the resource either be a dotted module
path, or just not allowing subpaths.  The difference is a requirement in the
layout of the package, e.g.

pkgutil.get_data('my.package.path', 'subpath/foo.dat')
pkgutil.get_data('my.package.path.subpath', 'foo.dat')

The latter requires that 'subpath' be a subpackage while the former does not.
Personally, that seems like a fine restriction to me, but that's how I always
lay out my in-package data anyway.

Loader implementers OTOH, do care, but there's a lot fewer of them than users.

>If not I wonder if we can make Loader.get_data accept a relative path as
>well. I think this is a generally more useful way of using the function
>because it doesn't restrict loaders to file system only (which get_data
>currently is restricted to I believe) and it lets the Loader encaspulate the
>logic about how to translate a relative path to a chunk of data instead of
>needing the caller to do that.

+1

>My other problem is that pkgutil.get_data doesn't currently work for the PEP
>420 namespace packages and due to the above I'm not sure how to actually make
>it work in a reasonable way without allowing get_data to accept relative
>paths as well.

Well, with the restriction on resource subpaths above, there's no problem,
right?

pkgutil.get_data('my.package.path.subpath', 'foo.dat')

Assuming subpath is contained within a namespace portion, it should be
unambiguous where it comes from.

pkgutil.get_data('my.package.path', 'foo.dat')

If 'my.package.path' is a namespace package then there *isn't* any portion
containing foo.dat, so this should return None because the namespace loader
won't have get_data() implemented on it.

I understand that imposing this restriction is a backward compatibility break,
so it may not be adoptable.  There are ways to get around that (add a flag to
the API, implement a new pkgutil API with the restriction and deprecate
.get_data(), etc.).  However, for PEP 420 packages, you could impose this
restriction in .get_data() without the backward compatibility problem.  And
certainly in any new APIs, e.g. .get_package_filename()
a.k.a. resource_filename() you can do impose this restriction.

I also think resource_stream() should be implemented as well, but maybe it
should be called `pkgutil.open(package, resource, mode, encoding)` ?

I can live without resource_listdir().

Cheers,
-Barry


More information about the Import-SIG mailing list