[Import-SIG] Loading Resources From a Python Module/Package

Donald Stufft donald at stufft.io
Sat Jan 31 00:37:44 CET 2015


It's often times useful to be able to load a resource from a Python module or
packaging. Currently you can load the data into memory using pkgutil.get_data
however this doesn't help much if you need to pass that data into an API that
only accepts a filepath. Currently code that needs to do this often times does
something like os.path.join(os.path.dirname(__file__), "myfile.txt"), however
that doesn't work from within a zip file.

I think it would be a good idea to implement a pkgutil.get_data_filename
function which would return a filename that can be accessed to get at that
particular bit of package data. In addition I think it would be a good idea
to add an optional get_data_filename method onto the Loader that can be used
by a loader to indicate when a file *already* exists on the filesystem.

Essentially this boils down to the pkgutil.get_data_filename(package, resource)
function doing this:

1. Check if the loader for the package implements a get_data_filename method
   and if it does and it returns a value that is not None simply return that
   value. The FileLoader can have a simple get_data_filename then that just
   returns the on disk filename.

2. If the loader doesn't have a get_data_filename method or it returns a None
   value then call pkgutil.get_data and if that returns None then return None
   ourselves. If it doesn't return None then save that data to a temporary file
   and return the path to that temporary file.

I've implemented this (without tests) you can see here: https://bpaste.net/show/2e51b0588dcd

I have a few concerns however, currently Loader.get_data() requires you to pass
the entire path of the file you want to open
(like /usr/lib/python3.5/site-packages/foo/bar.txt or /data/foo.zip/bar.txt)
however I've made Loader.get_data_filename() want a relative path
(like bar.txt).

I wonder if this difference is OK? If not I wonder if we can make
Loader.get_data accept a relative path as well. I think this is a generally
more useful way of using the function because it doesn't restrict loaders to
file system only (which get_data currently is restricted to I believe) and it
lets the Loader encaspulate the logic about how to translate a relative path to
a chunk of data instead of needing the caller to do that.

My other problem is that pkgutil.get_data doesn't currently work for the
PEP 420 namespace packages and due to the above I'm not sure how to actually
make it work in a reasonable way without allowing get_data to accept relative
paths as well. Because my patch lets the Loader encapsulate turning a relative
path into a file path pkgutil.get_data_filename() and
_NamespaceLoader.get_data_filename both work and support PEP 420 namespace
packages.

A. What do people think about pkgutil.get_data_filename and
   Loader.get_data_filename?

B. What do people think about modifying Loader.get_data so it can support
   relative filenames instead of the calling code needing to handle that?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



More information about the Import-SIG mailing list