pkg_resources API use according to Nullege search engine
I thought it would be interesting to get an idea of the relative popularity of the various pkg_resources calls in the interest of emulating a subset. So I looked it up on Nullege. The absolute popularity is certainly different from the numbers below. I think of require() and load_entry_point() used in console scripts as implementing a dynamic linker, so for example if some code crashes when the KDEWallet bindings are installed, you would still be able to run it by omitting those from sys.path. Gem certainly works this way. I'm not 100% sure pkg_resources works that way. Those functions traverse the dependency graph at runtime which is fast because it doesn't include the "search pypi" stuff. resource_filename is very popular. I would have thought resource_stream would be more popular. Unless your package is zipped resource_filename is trivial to implement. If you are writing an installer, you can unpack a single distribution to a folder, run find_distributions() on that folder, and get a Distribution() object with the dependencies as a dict. -- Daniel H pkg_resources APIs by number of Nullege Call() counts: 'require', 2284 'run_script', 0 'get_provider', 32 'get_distribution', 103 'load_entry_point', 31 'get_entry_map', 11 'get_entry_info', 4 'iter_entry_points', 291 'resource_string', 175 'resource_stream', 155 'resource_filename', 713 'resource_listdir', 71 'resource_exists', 67 'resource_isdir', 18 'declare_namespace', 643; obsoleted by Python 3.3 or pkgutil 'working_set', 55 (all samples); not a function 'add_activation_listener', 3 'find_distributions', 25; needed by installers 'set_extraction_path', 2 'cleanup_resources', 4 'get_default_cache', 2 'Environment', 51 'WorkingSet', 17 'ResourceManager', 27 'Distribution', 22; needed by installers 'Requirement.parse', 524 'EntryPoint.parse', 45 'ResolutionError', 6 samples 'VersionConflict', 13 samples 'DistributionNotFound', 41 samples 'UnknownExtra', 8 samples 'ExtractionError', 0 'parse_requirements', 32 'parse_version', 100 'safe_name', trivial regexp replace 'safe_version', "" 'get_platform', 1 'compatible_platforms', 0 'yield_lines', 15; trivial 'split_sections', 9; .ini-style [section] parser 'safe_extra', 0; another regexp replacement 'to_filename', 9; another text replacement 'ensure_directory', 40 'normalize_path', 42 # Constants that control which kinds of dists are preferred 'EGG_DIST', 'BINARY_DIST', 'SOURCE_DIST', 'CHECKOUT_DIST', 'DEVELOP_DIST', 'IMetadataProvider', 'IResourceProvider', 'FileMetadata', 'PathMetadata', 23 'EggMetadata', 2 'EmptyProvider', 'empty_provider', 'NullProvider', subclassed a few times for unit test mocks 'EggProvider', 0 'DefaultProvider', 21 mentions 'ZipProvider', 1 subclass 'register_finder', 'register_namespace_handler', 'register_loader_type', 6; one user is a code signing library 'fixup_namespace_packages', 0 'get_importer', 0 'AvailableDistributions']
Daniel Holth <dholth <at> gmail.com> writes:
resource_filename is very popular. I would have thought resource_stream would be more popular. Unless your package is zipped resource_filename is trivial to implement.
Yes, I find that odd, too. pkg_resources seems to extract files from zip into a cache folder, then returns filenames from the location in the cache; it seems a lot of trouble to go to just to be able to deliver a filename. In the Resource that I implemented in distlib, I have the "path" attribute of a resource which is analogous, though it's not directly usable as a file path for resources in a zip. However, since the resource is available as bytes or a stream, those applications which really need a filename (perhaps to pass to a third-party API which expects a filename) can handle that themselves, e.g. by saving the bytes to a temporary location and passing that to whatever needs a filename. Regards, Vinay Sajip
On Oct 01, 2012, at 07:35 AM, Vinay Sajip wrote:
Yes, I find that odd, too. pkg_resources seems to extract files from zip into a cache folder, then returns filenames from the location in the cache; it seems a lot of trouble to go to just to be able to deliver a filename.
Darn handy though when interacting with some APIs which require filenames.
In the Resource that I implemented in distlib, I have the "path" attribute of a resource which is analogous, though it's not directly usable as a file path for resources in a zip. However, since the resource is available as bytes or a stream, those applications which really need a filename (perhaps to pass to a third-party API which expects a filename) can handle that themselves, e.g. by saving the bytes to a temporary location and passing that to whatever needs a filename.
Why not provide this by distlib? -Barry
Barry Warsaw <barry <at> python.org> writes:
a third-party API which expects a filename) can handle that themselves, e.g. by saving the bytes to a temporary location and passing that to whatever needs a filename.
Why not provide this by distlib?
Only because it doesn't add much value, and (if you do all that pkg_resources does) might mean that you have to maintain a cache. As a matter of interest, what are the APIs you're using which need filenames but can't use bytes or streams? Regards, Vinay Sajip
On Oct 01, 2012, at 03:43 PM, Vinay Sajip wrote:
Barry Warsaw <barry <at> python.org> writes:
Why not provide this by distlib?
Only because it doesn't add much value, and (if you do all that pkg_resources does) might mean that you have to maintain a cache.
Well, it's just something that client code would have to do anyway, which leads to multiple wheel invention and bugs.
As a matter of interest, what are the APIs you're using which need filenames but can't use bytes or streams?
Two that come to mind. * Testing command line arguments that take a file name (e.g. test that --config <filename> works). * Generating file: urls to generalized downloaders for local resources. Cheers, -Barry
On Mon, Oct 1, 2012 at 12:18 PM, Barry Warsaw <barry@python.org> wrote:
On Oct 01, 2012, at 03:43 PM, Vinay Sajip wrote:
As a matter of interest, what are the APIs you're using which need filenames but can't use bytes or streams?
dlopen() is the canonical example of an API that can only use a filename.
Daniel Holth <dholth <at> gmail.com> writes:
dlopen() is the canonical example of an API that can only use a filename.
Okay. Naturally there is already support for absolute paths in the file system for resources which are in the file system, so the question was really for resources in zips. Is it expected that the scenario will be quite common to get a .so or similar out of a package in a .zip into a cache, so that the name in the cache can be passed to dlopen()? Do you know of any specific PyPI distributions which do that, so I can look at them for testing etc? Regards, Vinay Sajip
On Tue, Oct 2, 2012 at 2:44 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Daniel Holth <dholth <at> gmail.com> writes:
dlopen() is the canonical example of an API that can only use a filename.
Okay. Naturally there is already support for absolute paths in the file system for resources which are in the file system, so the question was really for resources in zips. Is it expected that the scenario will be quite common to get a .so or similar out of a package in a .zip into a cache, so that the name in the cache can be passed to dlopen()? Do you know of any specific PyPI distributions which do that, so I can look at them for testing etc?
Just install any .egg distribution containing a C extension, without unzipping it. (bdist_egg builds eggs with the support for this inside them, using pkg_resources' API.) Btw, I think you might be missing part of the point of resource_filename() - it's not just that there are APIs (and external programs) that need real filenames, but that there are also C APIs that operate on OS-level file handles, so just having the bytes or a Python stream isn't sufficient. pkg_resources mainly included the support for this to deal with zipped C extensions, but also to support migrating code that still uses filenames internally for whatever reason, or needs them to pass to command line tools or C APIs. There's little point in making every application develop its own configuration and implementation for file extraction.
PJ Eby <pje <at> telecommunity.com> writes:
Btw, I think you might be missing part of the point of resource_filename() - it's not just that there are APIs (and external programs) that need real filenames, but that there are also C APIs that operate on OS-level file handles, so just having the bytes or a Python stream isn't sufficient.
I think I was missing it - thanks to all of you who chipped in to clarify it for me. Regards, Vinay Sajip
On Oct 02, 2012, at 06:44 PM, Vinay Sajip wrote:
Okay. Naturally there is already support for absolute paths in the file system for resources which are in the file system, so the question was really for resources in zips. Is it expected that the scenario will be quite common to get a .so or similar out of a package in a .zip into a cache, so that the name in the cache can be passed to dlopen()? Do you know of any specific PyPI distributions which do that, so I can look at them for testing etc?
This is kind of missing the point. You want to be able to write code against a single API and not care whether the file you need is already unpacked or in a zip. Cheers, -Barry
On Sep 30, 2012, at 10:00 PM, Daniel Holth wrote:
resource_filename is very popular. I would have thought resource_stream would be more popular. Unless your package is zipped resource_filename is trivial to implement.
I've used declare_namespace() once or twice. I use resource_filename(), resource_string(), resource_stream(), and resource_listdir() quite a bit. Cheers, -Barry
On Mon, Oct 1, 2012 at 10:46 AM, Barry Warsaw <barry@python.org> wrote:
On Sep 30, 2012, at 10:00 PM, Daniel Holth wrote:
resource_filename is very popular. I would have thought resource_stream would be more popular. Unless your package is zipped resource_filename is trivial to implement.
I've used declare_namespace() once or twice.
I use resource_filename(), resource_string(), resource_stream(), and resource_listdir() quite a bit.
declare_namespace() is special because we could probably rewrite it at install time (in Python 3.3, by deleting __init__.py) or by calling pkgutil.
participants (4)
-
Barry Warsaw
-
Daniel Holth
-
PJ Eby
-
Vinay Sajip