Accessing package data files in wheels
Hi, What is the recommended way to get access to a data file in my package if I'm using a wheel? pkg_resources seems like it's mainly useful because eggs might be used in a zipped form, but for wheels, I'm guaranteed that the data will be unpacked into a directory structure right? In that case, should I just use __file__ to manually find the location of the file? I'm assuming I can still use pkg_resources, but it feels like I'm adding an unnecessary dependency on setuptools, given that setuptools isn't needed during installation but only when building the wheel. Thanks! Justin
Justin Uang
What is the recommended way to get access to a data file in my package if I'm using a wheel? pkg_resources seems like it's mainly useful because eggs might be used in a zipped form, but for wheels, I'm guaranteed that the data will be unpacked into a directory structure right? In that case, should I just use __file__ to manually find the location of the file?
An advantage of querying package resources using the ‘pkg_resources’ API is that their lockation *doesn't* need to be known by your program. Let the installation process put them in a sensible location for the system, and discover that location at run-time using ‘pkg_resources’.
I'm assuming I can still use pkg_resources, but it feels like I'm adding an unnecessary dependency on setuptools, given that setuptools isn't needed during installation but only when building the wheel.
You'll be doing the recipient a favour by allowing the package resources to be in a different place from the executable files. -- \ “Philosophy is questions that may never be answered. Religion | `\ is answers that may never be questioned.” —anonymous | _o__) | Ben Finney
On 27 June 2015 at 19:52, Justin Uang
What is the recommended way to get access to a data file in my package if I'm using a wheel? pkg_resources seems like it's mainly useful because eggs might be used in a zipped form, but for wheels, I'm guaranteed that the data will be unpacked into a directory structure right? In that case, should I just use __file__ to manually find the location of the file?
If you want to avoid a dependency on pkg_resources, you can use pkgutil.get_data (from the stdlib). It doesn't have as many features as pkg_resources, but it does the job in straightforward cases. Regarding zipped form, you are never guaranteed that your code is on the filesystem. Wheels always install unzipped, as you say, but deployment utilities like pyzzer/zipapp, or py2exe/cx_Freeze, bundle your application into a zip file for execution. So unless you want to document that your package doesn't support such deployment methods, you shouldn't assume you're installed to the filesystem (and hence you should avoid using __file__ and doing path manipulation on it). Paul
Hello, On Sun, 28 Jun 2015 13:02:40 +0100 Paul Moore
On 27 June 2015 at 19:52, Justin Uang
wrote: What is the recommended way to get access to a data file in my package if I'm using a wheel? pkg_resources seems like it's mainly useful because eggs might be used in a zipped form, but for wheels, I'm guaranteed that the data will be unpacked into a directory structure right? In that case, should I just use __file__ to manually find the location of the file?
If you want to avoid a dependency on pkg_resources, you can use pkgutil.get_data (from the stdlib). It doesn't have as many features as pkg_resources, but it does the job in straightforward cases.
Which makes everyone in the audience wonder: how it happens that it's 2015, Python is at 3.5, but pkgutil.get_data() is in stdlib, while pkg_resources.resource_stream() isn't? An implementation of pkgutil.get_data() would be based on pkg_resources.resource_stream(), or would contain just the same code as the latter, so it could easily be exposed, and yet it isn't. [] -- Best regards, Paul mailto:pmiscml@gmail.com
On 29 June 2015 at 16:56, Paul Sokolovsky
Hello,
On Sun, 28 Jun 2015 13:02:40 +0100 Paul Moore
wrote: On 27 June 2015 at 19:52, Justin Uang
wrote: What is the recommended way to get access to a data file in my package if I'm using a wheel? pkg_resources seems like it's mainly useful because eggs might be used in a zipped form, but for wheels, I'm guaranteed that the data will be unpacked into a directory structure right? In that case, should I just use __file__ to manually find the location of the file?
If you want to avoid a dependency on pkg_resources, you can use pkgutil.get_data (from the stdlib). It doesn't have as many features as pkg_resources, but it does the job in straightforward cases.
Which makes everyone in the audience wonder: how it happens that it's 2015, Python is at 3.5, but pkgutil.get_data() is in stdlib, while pkg_resources.resource_stream() isn't? An implementation of pkgutil.get_data() would be based on pkg_resources.resource_stream(), or would contain just the same code as the latter, so it could easily be exposed, and yet it isn't.
This has the same 3 part answer as a lot of "Why isn't this in the standard library?" questions: 1. nobody has volunteered to drive the standardisation process 2. most of the folks who currently need this functionality want to support older versions of Python, so something pip-installable is actually more use to them than standard library support 3. the folks that *are* currently working on improving the out-of-the-box experience are working on other aspects of that problem (1) is the actual reason, while (2) and (3) are then a couple of the second order factors contributing to (1) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
From: Paul Sokolovsky
Which makes everyone in the audience wonder: how it happens that it's 2015, Python is at 3.5, but pkgutil.get_data() is in stdlib, while> pkg_resources.resource_stream() isn't? An implementation of pkgutil.get_data() would be based on pkg_resources.resource_stream(), or would contain just the same code as the latter, so it could easily be exposed, and yet it isn't. Perhaps because it's not always that way around - pkg_resources.get_stream_resource, in relevant cases, returns a BytesIO which wraps a byte string. If you want a stream, you could just as easily do this yourself by calling io.BytesIO(pkg_util.get_data('foo.pkg', 'foo_resource')). In the case of file resources only, pkg_resources.get_stream_resource can open the file and return the stream directly. But this is an optimisation and is not always necessarily available for all different loader implementations - which is perhaps why a `pkgutil.get_data_stream` is not provided. Regards, Vinay Sajip
On 29 June 2015 at 07:56, Paul Sokolovsky
If you want to avoid a dependency on pkg_resources, you can use pkgutil.get_data (from the stdlib). It doesn't have as many features as pkg_resources, but it does the job in straightforward cases.
Which makes everyone in the audience wonder: how it happens that it's 2015, Python is at 3.5, but pkgutil.get_data() is in stdlib, while pkg_resources.resource_stream() isn't? An implementation of pkgutil.get_data() would be based on pkg_resources.resource_stream(), or would contain just the same code as the latter, so it could easily be exposed, and yet it isn't.
In addition to Nick's response, which are the main reasons, there is also a more fundamental issue behind this. The PEP 302 definition of a loader only provides a get_data method, which corresponds directly to pkgutil.get_data. Any additional features provided by pkg_resources are not supported directly by the loader protocol, and so could not be guaranteed to be present for an arbitrary loader. pkg_resources provides the extended features (I believe) by special-casing filesystem and zip loaders, and providing an extension mechanism for other loaders to participate in the functionality, but that extension mechanism is not in the stdlib either. So adding more resource loading features means extending the PEP 302 protocols, etc. That's the work that no-one is currently doing that blocks the process. Having said all this, PEP 302 is pretty old now, and importlib makes all of this *much* easier, so (as long as you're only targeting recent Python versions, which stdlib support would be) it's a lot simpler to do this now than it was when we wrote PEP 302. And of course, in practical terms filesystem and zip loaders are the only significant ones that exist anyway... Paul
Hello, On Mon, 29 Jun 2015 09:15:57 +0100 Paul Moore
On 29 June 2015 at 07:56, Paul Sokolovsky
wrote: If you want to avoid a dependency on pkg_resources, you can use pkgutil.get_data (from the stdlib). It doesn't have as many features as pkg_resources, but it does the job in straightforward cases.
Which makes everyone in the audience wonder: how it happens that it's 2015, Python is at 3.5, but pkgutil.get_data() is in stdlib, while pkg_resources.resource_stream() isn't? An implementation of pkgutil.get_data() would be based on pkg_resources.resource_stream(), or would contain just the same code as the latter, so it could easily be exposed, and yet it isn't.
In addition to Nick's response, which are the main reasons, there is also a more fundamental issue behind this.
The PEP 302 definition of a loader only provides a get_data method, which corresponds directly to pkgutil.get_data.
Thanks for this clarification, I expected it to be not just purely logistical reason ("nobody yet got to it"), but also technical/API limitation ("Python core/stdlib doesn't have required prerequisites"). But then it's another level of the same question: we have distutils-sig group with people who oversee Python packaging needs (and related questions), they lately told us (Python community) that e.g. we should stop using Eggs and start using Wheels, so there's lot of active work happens in the area, and yet we're stuck at the old base PEPs which overlooked providing stream access protocol for package resources access. Let that be rhetoric question then, and let everyone assume that so far trading eggs for wheels was more important than closing a visible accidental gap in the stdlib/loader API.
Any additional features provided by pkg_resources are not supported directly by the loader protocol, and so could not be guaranteed to be present for an arbitrary loader. pkg_resources provides the extended features (I believe) by special-casing filesystem and zip loaders, and providing an extension mechanism for other loaders to participate in the functionality, but that extension mechanism is not in the stdlib either.
So adding more resource loading features means extending the PEP 302 protocols, etc. That's the work that no-one is currently doing that blocks the process.
Having said all this, PEP 302 is pretty old now, and importlib makes all of this *much* easier, so (as long as you're only targeting recent Python versions, which stdlib support would be) it's a lot simpler to do this now than it was when we wrote PEP 302. And of course, in practical terms filesystem and zip loaders are the only significant ones that exist anyway...
There was recent discussion on python-dev how other languages are cooler because they allow to create self-contained executables. Python has all parts of the equation already - e.g. the standard way to put Python source inside an executable is using frozen modules. So, that's another usecase for accessing package resources - was support for frozen modules was implemented at all? Granted, frozen modules may be not the easiest way for users to achieve self-contained executables, but the whole point is that Python already has it. I'm worked on another small under-popular scripting language before, and felt that I had all opportunities to make the most perfect language ever - except that it yet needs to be done. As soon as opportunity allowed, I switched to MicroPython, to reuse wealth of APIs and schemas devised by smart people, 90% of which are close to perfect. Well, there's gaps and warts still. So, per the above logic, I don't try to invent something new, but use (at least for starters) frozen modules in MicroPython. But then there's interesting situation - frozen modules are core Python feature, while accessing arbitrary files in them as stream is not. So, even if the core interface doesn't require me to provide stream access to files in frozen modules, I still need to provide it, otherwise I simply won't be able to freeze real-world Python packages.
Paul
-- Best regards, Paul mailto:pmiscml@gmail.com
On 29 June 2015 at 10:26, Paul Sokolovsky
and yet we're stuck at the old base PEPs which overlooked providing stream access protocol for package resources access.
The PEP did not "overlook" stream access. Rather, the compatibility constraints and the need to support existing code meant that we needed to ensure that we required the minimal possible interface from loaders. Even get_data was an optional interface. In practice, many of the constraints around at the time no longer apply, and zip and filesystem loaders remain the most common examples, so the conservative approach of PEP 302 can be revisited (as I said). But someone needs to step up and manage such a change before it will happen.
Let that be rhetoric question then, and let everyone assume that so far trading eggs for wheels was more important than closing a visible accidental gap in the stdlib/loader API.
The egg->wheel transition was about *distribution* formats. The loader API is a runtime facility. The two are unrelated. One of the problems with eggs was the fact that they were a combined installation and runtime format, so confusing the two aspects is understandable (but still incorrect).
There was recent discussion on python-dev how other languages are cooler because they allow to create self-contained executables. Python has all parts of the equation already - e.g. the standard way to put Python source inside an executable is using frozen modules. So, that's another usecase for accessing package resources - was support for frozen modules was implemented at all?
From that, it appears that the frozen module importer does not implement the ResourceLoader API. So no, get_data is not supported for frozen modules. Of course, you can write your own extension of FrozenImporter for your application, so it's entirely possible to get
See https://docs.python.org/3.5/library/importlib.html#importlib.machinery.Froze... this to work. But the standard Python bootstrap process (which is FrozenImporter's main use case, AFAIK) doesn't need that feature, which is probably why it's not present. Anyway, as you can see, all the various mechanisms are available, and extending importlib is certainly possible, so as we've said it's really only about someone with the motivation doing the work. It could probably even be done as a 3rd party project in the first place (much like pkg_resources was) and then proposed for inclusion in the stdlib once it has been found to be useful. Paul
I'm traveling so I can't do a thorough reply, but a goal of mine for Python 3.6 is finally solve the data access problem for packages based on Donald's importlib.resources proposal as well as pkg_resources to try and learn from previous mistakes. On Mon, Jun 29, 2015, 04:52 Paul Moore
On 29 June 2015 at 10:26, Paul Sokolovsky
wrote: and yet we're stuck at the old base PEPs which overlooked providing stream access protocol for package resources access.
The PEP did not "overlook" stream access. Rather, the compatibility constraints and the need to support existing code meant that we needed to ensure that we required the minimal possible interface from loaders. Even get_data was an optional interface.
In practice, many of the constraints around at the time no longer apply, and zip and filesystem loaders remain the most common examples, so the conservative approach of PEP 302 can be revisited (as I said). But someone needs to step up and manage such a change before it will happen.
Let that be rhetoric question then, and let everyone assume that so far trading eggs for wheels was more important than closing a visible accidental gap in the stdlib/loader API.
The egg->wheel transition was about *distribution* formats. The loader API is a runtime facility. The two are unrelated.
One of the problems with eggs was the fact that they were a combined installation and runtime format, so confusing the two aspects is understandable (but still incorrect).
There was recent discussion on python-dev how other languages are cooler because they allow to create self-contained executables. Python has all parts of the equation already - e.g. the standard way to put Python source inside an executable is using frozen modules. So, that's another usecase for accessing package resources - was support for frozen modules was implemented at all?
See https://docs.python.org/3.5/library/importlib.html#importlib.machinery.Froze...
From that, it appears that the frozen module importer does not implement the ResourceLoader API. So no, get_data is not supported for frozen modules. Of course, you can write your own extension of FrozenImporter for your application, so it's entirely possible to get this to work. But the standard Python bootstrap process (which is FrozenImporter's main use case, AFAIK) doesn't need that feature, which is probably why it's not present.
Anyway, as you can see, all the various mechanisms are available, and extending importlib is certainly possible, so as we've said it's really only about someone with the motivation doing the work. It could probably even be done as a 3rd party project in the first place (much like pkg_resources was) and then proposed for inclusion in the stdlib once it has been found to be useful.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 29 June 2015 at 20:52, Paul Moore
On 29 June 2015 at 10:26, Paul Sokolovsky
wrote: and yet we're stuck at the old base PEPs which overlooked providing stream access protocol for package resources access.
The PEP did not "overlook" stream access. Rather, the compatibility constraints and the need to support existing code meant that we needed to ensure that we required the minimal possible interface from loaders. Even get_data was an optional interface.
In practice, many of the constraints around at the time no longer apply, and zip and filesystem loaders remain the most common examples, so the conservative approach of PEP 302 can be revisited (as I said). But someone needs to step up and manage such a change before it will happen.
And active import system experts are even thinner on the ground than packaging experts :) It's good to hear Brett's planning to dive into this for 3.6, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (7)
-
Ben Finney
-
Brett Cannon
-
Justin Uang
-
Nick Coghlan
-
Paul Moore
-
Paul Sokolovsky
-
Vinay Sajip