[issue38594] importlib.metadata documentation deficiencies
New submission from Gregory Szorc <gregory.szorc@gmail.com>: As I was attempting to implement the find_distributions() interface for PyOxidizer, I got confused by importlib.metadata's documentation. The documentation for this module states: ``` What this means in practice is that to support finding distribution package metadata in locations other than the file system, you should derive from ``Distribution`` and implement the ``load_metadata()`` method. Then from your finder, return instances of this derived ``Distribution`` in the ``find_distributions()`` method. ``` The reference to `load_metadata()` is the only occurrence of the string `load_metadata` in the CPython and importlib_metadata code bases. I therefore believe the documentation in both CPython and the importlib_metadata standalone package are wrong because they are referring to a method that is never implemented nor called. Looking at the documentation and source code for importlib.metadata, I'm also a bit confused about how exactly I'm supposed to implement a custom Distribution which isn't based on filesystems. For example, I see that certain APIs return Path-like objects (which I will need to implement). But it isn't clear exactly which attributes are mandated to exist! Am I expected to implement the full pathlib.Path interface or just a subset? Regarding how find_distributions() is called, I also don't understand why the Context is optional and how Context could be used in some situations. For example, the implementation of discover() can construct Context instances with no arguments, which is then fed into find_distributions(). So I guess context=None or context.name=None implies "return Distribution's for every known package?" If so, this behavior is undocumented. I'm also not sure what Context.path is for. I /think/ it is only used for the path-based finder/distribution. But the way it is documented implies it should always exist, which doesn't seem appropriate for cases like PyOxidizer which will retrieve metadata from in-memory without filesystem I/O. I think what I'm trying to say is that the existing documentation for importlib.metadata is not sufficient to robustly implement a custom find_distributions() + Distribution type. I would kindly request that a domain expert revise the documentation such that a 3rd party can implement a custom solution. My preferred solution would be for there to be formal interfaces in importlib.abc like there are for everything else in the importlib realm. (The interfaces for finders and loaders are super useful when implementing a finder/loader from scratch.) FWIW I think I like the new metadata API and I think it is flexible enough to allow tools like PyOxidizer to do crazy things like divorce resources from the filesystem! But it is hard to say for sure since the interfaces aren't clearly defined at present. ---------- assignee: docs@python components: Documentation messages: 355402 nosy: barry, docs@python, indygreg, jaraco priority: normal severity: normal status: open title: importlib.metadata documentation deficiencies type: enhancement versions: Python 3.8, Python 3.9 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Change by Jason R. Coombs <jaraco@jaraco.com>: ---------- assignee: docs@python -> jaraco _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: Good suggestions. Thanks for taking the time to articulate in such a friendly way the shortcomings you encountered. I'm happy to help. In [this ticket](https://gitlab.com/python-devs/importlib_metadata/issues/105), I've mirrored this ticket in the backport project, where I can iterate much faster. I'll provide brief answers to some of your questions/concerns here and then work out the wording for the documentation (and code changes) necessary to communicate that effectively.
The reference to `load_metadata()` is the only occurrence of the string `load_metadata` in the CPython and importlib_metadata code bases. I therefore believe the documentation in both CPython and the importlib_metadata standalone package are wrong because they are referring to a method that is never implemented nor called.
That's right. The documentation is wrong. It should say to implement a `Distribution` subclass (especially its abstract methods). Nothing more should be necessary.
I see that certain APIs return Path-like objects (which I will need to implement)
Are you sure about that? The only code I see in the `Distribution` class that references a `Path` object is `.at()`, a static method that we decided to add to the `Distribution` class even though it would be more appropriate in the `PathDistribution` class in order to make that method readily available to projects that wished to construct the Path-based distribution objects from a file system (or zipfile) path. I'm pretty sure everything else in the Distribution class relies on the two abstract methods. If you disregard `at` (and I recommend you do) and focus on implementing the abstract methods, I think things will work. Let me know if you find otherwise.
why the Context is optional and how Context could be used?
The interface is intentionally vague in order not to be too prescriptive, because as you point out, name or path may not be relevant in some contexts. It's meant to narrow the scope of any search. So if a path is present, that means the query is looking in a specific 'sys.path' entry. And if the name is present, that means it's looking for a distribution having a specific name. But basically, you can solicit any properties you like. You could expect a `size='max(100)'` parameter and only return distributions smaller than 100 (for whatever interpretation of `size` you wish to implement. Your DistributionFinder should do its best to honor whatever context might be relevant to the Distributions you provide. Does PyOxidizer interact with `sys.path` at all? If not, it can disregard `Context.path`. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: Please have a look at https://gitlab.com/python-devs/importlib_metadata/merge_requests/104/diffs, which attempts to clarify the documentation to indicate how one would implement a custom finder. If you have a prototype implementation, I'd be happy to have a look. The use-case you present is exactly the type of use-case this project wishes to enable, so I'm grateful that you're working on it and I'd like to do what I can to support the effort. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: I've merged the recommended changes into importlib_metadata 1.3 and I'm including those changes in issue39022. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Change by Jason R. Coombs <jaraco@jaraco.com>: ---------- keywords: +patch pull_requests: +17041 stage: -> patch review pull_request: https://github.com/python/cpython/pull/17568 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: New changeset b7a0109cd2bafaa21a4d50aad307e901c68f9156 by Jason R. Coombs in branch 'master': bpo-39022, bpo-38594: Sync with importlib_metadata 1.3 (GH-17568) https://github.com/python/cpython/commit/b7a0109cd2bafaa21a4d50aad307e901c68... ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +17043 pull_request: https://github.com/python/cpython/pull/17569 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: New changeset b738237d6792acba85b1f6e6c8993a812c7fd815 by Jason R. Coombs (Miss Islington (bot)) in branch '3.8': bpo-39022, bpo-38594: Sync with importlib_metadata 1.3 (GH-17568) (GH-17569) https://github.com/python/cpython/commit/b738237d6792acba85b1f6e6c8993a812c7... ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
Jason R. Coombs <jaraco@jaraco.com> added the comment: I'm hoping those documentation edits address the deficiencies, but if not, we can take another stab at it. Feel free to re-open as needed. ---------- resolution: -> fixed stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue38594> _______________________________________
participants (3)
-
Gregory Szorc
-
Jason R. Coombs
-
miss-islington