PEP 426: proposed metadata caching convention
(This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now) One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well) Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process: <dist-info-dir>/__cache__/version.txt <dist-info-dir>/__cache__/requires-dist.txt <dist-info-dir>/__cache__/entry-points.txt version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field) requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the "extras" definition needed to get the environment marker to pass entry-points.txt would be the same format as the pkg_resources entry_points.txt Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25 February 2013 14:39, Nick Coghlan
(This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now)
One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well)
Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process:
<dist-info-dir>/__cache__/version.txt <dist-info-dir>/__cache__/requires-dist.txt <dist-info-dir>/__cache__/entry-points.txt
version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field)
requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the "extras" definition needed to get the environment marker to pass
entry-points.txt would be the same format as the pkg_resources entry_points.txt
Why a __cache__ subdirectory? Is this purely an easier-to-process copy of what's in the METADATA file? If so, I'd prefer to simply take the information out of the METADATA file and have it in a single separate file in the first place. IIUC, that's what Daniel is suggesting as well. We don't really need everything to be in a single file, surely? Paul.
On Tue, Feb 26, 2013 at 12:45 AM, Paul Moore
We don't really need everything to be in a single file, surely?
Yes, I want the metadata to map cleanly to a single data structure so it can be more easily managed through things that *aren't* file systems (such as finally getting the installation database to support import hooks and also for potential metadata publication through TUF). However, decomposing it for efficient runtime access and backwards compatibility reasons makes sense. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25 February 2013 14:57, Nick Coghlan
On Tue, Feb 26, 2013 at 12:45 AM, Paul Moore
wrote: We don't really need everything to be in a single file, surely?
Yes, I want the metadata to map cleanly to a single data structure so it can be more easily managed through things that *aren't* file systems (such as finally getting the installation database to support import hooks and also for potential metadata publication through TUF).
Fair point. OK, I can accept that the metadata stays in one file.
However, decomposing it for efficient runtime access and backwards compatibility reasons makes sense.
I'm not entirely sure what code will be responsible for that decomposition, though. In theory, it's obvious ("the installer") but real life is more complex. Consider the following toolchain (which is a real-life example of something I'm fiddling with at the moment): 1. setup.py -> use this to build a project and install it into a temporary location (this may be pure distutils, or it may be setuptools/distribute based) 2. distlib -> collect the data from the temporary location and put it into a wheel 3. pip -> unpack the wheel and install it into a virtualenv In this case, (1) is an "installer" and so should write the decomposed files. But (2) doesn't want them and must then delete or otherwise skip them. Then (3) recreates them. That's both wasteful, and potentially complex/error-prone (distlib doesn't do the skipping of decomposed metadata files, that's in user code as things stand at the moment). Maybe the simplest solution is to say that setup.py install is not, technically an installer - it's used as a component of a lot of builder-type toolchains (another example is bdist_wininst). So maybe setup.py install needs to grow an option to *not* create the decomposed files (at the same time as it gains the ability to write Metadata 2.0). But this may impact setuptools, not just the stdlib (I don't know if setuptools overrides the install command, but I suspect so). Handling of metadata format conversion (egg-info to dist-info, ignoring or copying extra files, when to decompose and ignore cached metadata) is fast becoming the most complicated bit of the whole process (I say this from experience of writing installers and converters using distlib). I'm getting very sick of writing variations on convert_egg_info functions with a variety of different subtle issues. But I suppose that's just saying that we need a transition plan, which you know. Paul.
On Mon, Feb 25, 2013 at 9:39 AM, Nick Coghlan
(This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now)
One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well)
Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process:
<dist-info-dir>/__cache__/version.txt <dist-info-dir>/__cache__/requires-dist.txt <dist-info-dir>/__cache__/entry-points.txt
version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field)
requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the "extras" definition needed to get the environment marker to pass
entry-points.txt would be the same format as the pkg_resources entry_points.txt
Cheers, Nick.
I like the idea of making the installer a little smarter for backwards compat. reasons. Wouldn't the specific cached files generated be the sole domain of a post-install hook provided by pkg_resources? The version is not parsed from METADATA when it can be read from the .dist-info directory's filename (true when it is not in development). When it is read, METADATA is only parsed until the line that starts with Version: It would be a win to evaluate the environment markers at install time. For my web applications that evaluate many .dist-info directories runtime parsing is good enough for me, but startup time pressure is higher for console scripts.
On Tue, Feb 26, 2013 at 12:46 AM, Daniel Holth
I like the idea of making the installer a little smarter for backwards compat. reasons. Wouldn't the specific cached files generated be the sole domain of a post-install hook provided by pkg_resources?
I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer. *Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no.
The version is not parsed from METADATA when it can be read from the .dist-info directory's filename (true when it is not in development). When it is read, METADATA is only parsed until the line that starts with Version:
I forgot the version was already in the directory name when I suggested version.txt, so ignore that part.
It would be a win to evaluate the environment markers at install time. For my web applications that evaluate many .dist-info directories runtime parsing is good enough for me, but startup time pressure is higher for console scripts.
Yeah, that's where I realised it was a useful idea for more than just backwards compatibility reasons, at least for requires-dist.txt - keep METADATA as the original cross-platform info, cache the installation specific version. entry-points.txt is pure backwards compatibility, though. The only reason I didn't suggest reusing the setuptools name for the file is because I want the __cache__ in the name to clearly identify the files the installer derives from METADATA rather than the ones defined in PEP 376 or installed as part of the distribution. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25 February 2013 15:10, Nick Coghlan
entry-points.txt is pure backwards compatibility, though. The only reason I didn't suggest reusing the setuptools name for the file is because I want the __cache__ in the name to clearly identify the files the installer derives from METADATA rather than the ones defined in PEP 376 or installed as part of the distribution.
One thing I *would* like to suggest is that the cached versions of the data should be optional. My specific reason for this is that as things stand, many wheels are usable without installation, simply by putting them on sys.path. As wheels are a distribution format, they won't have the cached data, and I'd be unhappy if that fact broke the ability to use them as zips. Paul.
Post install hooks are different than setup.py because they are installed first and then run for all packages, and are not requested by the installed dist. They are more like rewriting script #!python shebang. May I humbly suggest deleting things from this pep until it is acceptable and not the other way around? On Feb 25, 2013 11:54 AM, "Paul Moore"
On 25 February 2013 15:10, Nick Coghlan
wrote: entry-points.txt is pure backwards compatibility, though. The only reason I didn't suggest reusing the setuptools name for the file is because I want the __cache__ in the name to clearly identify the files the installer derives from METADATA rather than the ones defined in PEP 376 or installed as part of the distribution.
One thing I *would* like to suggest is that the cached versions of the data should be optional. My specific reason for this is that as things stand, many wheels are usable without installation, simply by putting them on sys.path. As wheels are a distribution format, they won't have the cached data, and I'd be unhappy if that fact broke the ability to use them as zips.
Paul.
Nick Coghlan
I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer.
*Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no.
I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute. If we want to move things forward in packaging, surely we have to make migration easier? IMO this was one of the things that distutils2/packaging also did not address sufficiently. Just to clarify: when I say "hooks", what I mean is "setuptools-style entry points that the installer looks for, which are used to customise the installation process". I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails. Regards, Vinay Sajip
On Wed, Feb 27, 2013 at 8:52 PM, Vinay Sajip
Just to clarify: when I say "hooks", what I mean is "setuptools-style entry points that the installer looks for, which are used to customise the installation process".
The command to create a wheel from a source archive is currently still "./setup.py bdist_wheel". This may be executed on an appropriate build system rather than the target system, but aside from that everything in setup.py should still execute normally. This is the major difference between the current attempt and distutils2: du2 made moving from setup.py to setup.cfg a requirement to generate the new metadata format. By contrast, I want at least distribute, as well as the Python 3.4 distutils, to be able to generate wheels (including the new metadata) from current setup.py files.
I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails.
For version 1.0, the only install-time modification that all wheel installers must implement is fanning files out to their target locations based on sysconfig directories and rewriting script shebang lines (they may also want to generate parallel Windows executables, but with the Windows launcher, that's less necessary). If a project needs more than that, they cannot ship wheels at this time, and will need to continue shipping source distributions that can execute arbitrary code at install time. Alternatively (and preferably), such a project could split out a support library that is wheel compatible, and have a separate component that must be installed from source and is able to make arbitrary changes to the target system. *Incremental* change, and explicitly leaving some use cases to source distribution and ./setup.py for the moment is the key to creating a distribution format that is as simple as we can make it while still supporting a wide variety of use cases. Will we eventually get pre-install and post-install hooks ala RPM and other platform specific systems? Quite possibly. But let's see how far we can get without them first - in particular, I want to focus people's initial efforts on putting the smarts into the wheel *creation* process rather than delaying decisions until install time. The initial problem I believe we need to solve is the one of arcane build systems for key dependencies, and the simple fact that most Windows users aren't equipped to build software written in C in the first place. Eggs tried to tackle that problem years ago, but ignored things like the Filesystem Hierarchy Standard and the interests of OS distributions and system administrators, limiting its adoption to those developers that were happy with the idea of storing *everything* inside a single directory (the various legitimate concerns with the default behaviour of easy_install also didn't help). Wheel is designed to integrate more cleanly with platform specific conventions, hopefully overcoming some of those past objections to the egg format. This preliminary approach also integrates well with centralised system management tools like Puppet, Chef and Salt - for those, the states and configurations of services and other components are handled through the management infrastructure, and the language specific package management tools are just a way to get the application code onto the target systems in a controlled fashion. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Feb 27, 2013 at 6:45 AM, Nick Coghlan
On Wed, Feb 27, 2013 at 8:52 PM, Vinay Sajip
wrote: Just to clarify: when I say "hooks", what I mean is "setuptools-style entry points that the installer looks for, which are used to customise the installation process".
The command to create a wheel from a source archive is currently still "./setup.py bdist_wheel". This may be executed on an appropriate build system rather than the target system, but aside from that everything in setup.py should still execute normally. This is the major difference between the current attempt and distutils2: du2 made moving from setup.py to setup.cfg a requirement to generate the new metadata format. By contrast, I want at least distribute, as well as the Python 3.4 distutils, to be able to generate wheels (including the new metadata) from current setup.py files.
Vinay's distlib has taken the wheel spec at its word, runs an unmodified "install" command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step. All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think that avoiding running *distutils* at install time is much more important than avoiding setup.py.
I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails.
For version 1.0, the only install-time modification that all wheel installers must implement is fanning files out to their target locations based on sysconfig directories and rewriting script shebang lines (they may also want to generate parallel Windows executables, but with the Windows launcher, that's less necessary).
If a project needs more than that, they cannot ship wheels at this time, and will need to continue shipping source distributions that can execute arbitrary code at install time. Alternatively (and preferably), such a project could split out a support library that is wheel compatible, and have a separate component that must be installed from source and is able to make arbitrary changes to the target system.
*Incremental* change, and explicitly leaving some use cases to source distribution and ./setup.py for the moment is the key to creating a distribution format that is as simple as we can make it while still supporting a wide variety of use cases. Will we eventually get pre-install and post-install hooks ala RPM and other platform specific systems? Quite possibly. But let's see how far we can get without them first - in particular, I want to focus people's initial efforts on putting the smarts into the wheel *creation* process rather than delaying decisions until install time.
It's just the 1.0 release. There's no hurry to write the document entitled "PEP 376 is now the/a standard *interchange* format for distribution metadata; here's how you can experiment with caching runtime introspection." Other tasks such as "create the simplest possible useful packaging system for the stdlib [by only including the install feature]" and "create an ecosystem of interoperable third-party products to do everything else" are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list.
The initial problem I believe we need to solve is the one of arcane build systems for key dependencies, and the simple fact that most Windows users aren't equipped to build software written in C in the first place. Eggs tried to tackle that problem years ago, but ignored things like the Filesystem Hierarchy Standard and the interests of OS distributions and system administrators, limiting its adoption to those developers that were happy with the idea of storing *everything* inside a single directory (the various legitimate concerns with the default behaviour of easy_install also didn't help). Wheel is designed to integrate more cleanly with platform specific conventions, hopefully overcoming some of those past objections to the egg format.
It's designed to make binary packaging generally interesting, even if you don't have C extensions, or even if you do have a C compiler. This will hopefully be a benefit to our Windows community as well.
This preliminary approach also integrates well with centralised system management tools like Puppet, Chef and Salt - for those, the states and configurations of services and other components are handled through the management infrastructure, and the language specific package management tools are just a way to get the application code onto the target systems in a controlled fashion.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Daniel Holth
Vinay's distlib has taken the wheel spec at its word, runs an unmodified "install" command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step.
Right, except there's no conversion of .egg-info to .dist-info in distlib itself. That's done by the separate wheeler.py demonstration script, which uses vanilla pip to install to a holding location, converts the .egg-info to .dist-info and then builds the wheel from that. At installation time, the wheel's .dist-info contents are moved to the installation site's site-packages, except for WHEEL, which is omitted, and RECORD which is recreated.
All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think
A happy accident, then!
that avoiding running *distutils* at install time is much more important than avoiding setup.py.
It's just the 1.0 release. There's no hurry to write the document entitled "PEP 376 is now the/a standard *interchange* format for [snip] third-party products to do everything else" are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list.
I suppose you're right, but I want to make as much progress as I can while I still have the time I can spend on this, and while the grey cells haven't succumbed to packaging fatigue ... :-) Regards, Vinay Sajip
On Wed, Feb 27, 2013 at 10:08 AM, Vinay Sajip
Daniel Holth
writes: Vinay's distlib has taken the wheel spec at its word, runs an unmodified "install" command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step.
Right, except there's no conversion of .egg-info to .dist-info in distlib itself. That's done by the separate wheeler.py demonstration script, which uses vanilla pip to install to a holding location, converts the .egg-info to .dist-info and then builds the wheel from that.
At installation time, the wheel's .dist-info contents are moved to the installation site's site-packages, except for WHEEL, which is omitted, and RECORD which is recreated.
All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think
A happy accident, then!
that avoiding running *distutils* at install time is much more important than avoiding setup.py.
It's just the 1.0 release. There's no hurry to write the document entitled "PEP 376 is now the/a standard *interchange* format for [snip] third-party products to do everything else" are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list.
I suppose you're right, but I want to make as much progress as I can while I still have the time I can spend on this, and while the grey cells haven't succumbed to packaging fatigue ... :-)
Luckily parts of your brain are red and black. I'm amazed at the effort you've put forth so far. The idea isn't to limit the amount of progress but simply to have a good separation between a smaller number things we need to agree on and probably put in the stdlib (for example dependency declarations and a basic binary format) and the things we don't have to or are very unlikely to agree on that will probably be outside the stdlib (for example a not-likely-forthcoming universal build system, and perhaps "the best" way to cache .dist-info assuming the feature is even beneficial at all). Anyway Nick has been describing a different thing "numpy or package specific post-install hook" than the proposal "some way to run code that is intended to cache .dist-info directories at install time without patching every installer".
On Feb 27, 2013, at 2:52 AM, Vinay Sajip
Nick Coghlan
writes: I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer.
*Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no.
I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute.
Quite so. Post-install hooks are a requirement for Twisted and for many projects which depend on Twisted. The hook is always the same on every platform, so it's not a platform-specific installer issue. Frankly, a big appeal of some next-generation package distribution system is the introduction of a proper set of events we can hook into, instead of assuming that by some accident of timing we can work out when the software is being "installed" and call some random function from the bottom of setup.py with a bunch of state scooped out of distutils' internals. The current situation is a total mess. -glyph
On Wednesday, February 27, 2013 at 1:47 PM, Glyph wrote:
On Feb 27, 2013, at 2:52 AM, Vinay Sajip
wrote: Nick Coghlan
http://gmail.com/)> writes: I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer.
*Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no.
I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute. Quite so.
Post-install hooks are a requirement for Twisted and for many projects which depend on Twisted. The hook is always the same on every platform, so it's not a platform-specific installer issue.
Frankly, a big appeal of some next-generation package distribution system is the introduction of a proper set of events we can hook into, instead of assuming that by some accident of timing we can work out when the software is being "installed" and call some random function from the bottom of setup.py with a bunch of state scooped out of distutils' internals. The current situation is a total mess.
-glyph
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig
I'm generally +1 on hooks, the failure of setup.py isn't particularly that it's executable, it's that you can't access the metadata without executing it. In general hooks also allow people to easily disable them during install if they don't wish for that (of course packages have no reason to support that if they don't want to).
On Feb 27, 2013, at 10:49 AM, Donald Stufft
I'm generally +1 on hooks, the failure of setup.py isn't particularly that it's executable, it's that you can't access the metadata without executing it. In general hooks also allow people to easily disable them during install if they don't wish for that (of course packages have no reason to support that if they don't want to).
I pretty much agree. I'd be happy – enthusiastic, even – for Twisted to update to some static metadata expression system. -glyph
On Mon, Feb 25, 2013 at 9:39 AM, Nick Coghlan
(This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now)
One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well)
Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process:
<dist-info-dir>/__cache__/version.txt <dist-info-dir>/__cache__/requires-dist.txt <dist-info-dir>/__cache__/entry-points.txt
version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field)
requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the "extras" definition needed to get the environment marker to pass
entry-points.txt would be the same format as the pkg_resources entry_points.txt
Cheers, Nick.
Since this isn't going to be backwards-compatible anyway, may I suggest that: 1. The caching algorithm be fixed and defined as part of the extension machinery 2. The caching consists of simply copying the data to a file, whose name is programmatically based on the extension/field name. 3. Environment markers are not processed - that's up to the tool consuming the cached data This way, if e.g. entry points are defined as an extension, then the Builder making a wheel doesn't need to "understand" entry points, it just has to copy fields to a file. It allows other resource types (like i18n/l10n resources) to be defined in the metadata and cached for runtime use, without needing a metadata version upgrade or any tool rewrites. And not processing environment markers means that pure-Python wheels can still be used by just placing them on sys.path.
On Wed, Feb 27, 2013 at 4:48 PM, PJ Eby
On Mon, Feb 25, 2013 at 9:39 AM, Nick Coghlan
wrote: (This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now)
One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well)
Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process:
<dist-info-dir>/__cache__/version.txt <dist-info-dir>/__cache__/requires-dist.txt <dist-info-dir>/__cache__/entry-points.txt
version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field)
requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the "extras" definition needed to get the environment marker to pass
entry-points.txt would be the same format as the pkg_resources entry_points.txt
Cheers, Nick.
Since this isn't going to be backwards-compatible anyway, may I suggest that:
1. The caching algorithm be fixed and defined as part of the extension machinery 2. The caching consists of simply copying the data to a file, whose name is programmatically based on the extension/field name. 3. Environment markers are not processed - that's up to the tool consuming the cached data
This way, if e.g. entry points are defined as an extension, then the Builder making a wheel doesn't need to "understand" entry points, it just has to copy fields to a file. It allows other resource types (like i18n/l10n resources) to be defined in the metadata and cached for runtime use, without needing a metadata version upgrade or any tool rewrites. And not processing environment markers means that pure-Python wheels can still be used by just placing them on sys.path.
My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done.
On Thu, Feb 28, 2013 at 7:59 AM, Daniel Holth
My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done.
How does the following idea sound? New metadata field: "Post-Install" Format: a *single* callable reference in entry-points format (i.e. "module.name:callable.name") Call signature: def post_install_hook(metadata, extras, previous_version=None): ... "extras" would be a tuple indicating which extras were installed. For an upgrade, "previous_version" would be set to the version that was previously installed. For a clean installation, it would either be None or omitted entirely. The "metadata" argument would be the PEP 426 metadata, reformatted as JSON-compatible structured metadata. I had planned to postpone defining the algorithm for that conversion until after PEP 426 acceptance, but if we're going to add a post-install hook mechanism to PEP 426, I think it makes more sense to define it up front: 1. The top level is a mapping, with lowercase versions of all PEP 426 fields as keys. All multiple-use fields other than "requires-python" are pluralised (that one is only multiple use so you can depend on a different version of Python given different environment markers - for example, supporting Python 2.6 everywhere, but requiring Python 2.7 on Windows. Aside from those cases, you can collapse an arbitrarily complex version specifier down to a single line) 3. Every mandatory field is present, with a string value 4. If present, the "keywords" field, references a list of keywords (created via str.split) 5. If present, the description is always stored under the "description" key, even if provided in the PEP 426 metadata payload 6. If any other optional field is present, it references a string value 7. If present, the "project-urls" key references a mapping of labels to URLs. 8. If present, the "extensions" key references a mapping of extension names to the extension's embedded JSON metadata. (Note: this is the key reason for my planned change to the extension format from arbitrary subfields to allowing only a single "json" subfield - it greatly simplifies this aspect of the translation to structured metadata, *and* makes it more flexible and powerful at the same time) 9. For any multi-use field that is present and supports environment markers, it is a reference to a mapping where each key is a whitespace-normalized (i.e. every sequence of whitespace converted to a single space) environment marker string that references a list of string values. The unqualified fields are referenced by the string "always". This breakdown allows each unique environment marker to be evaluated only once to determine whether or not it is applicable, regardless of how many times it was originally used. 10. If any other multi-use field is present, it references a list of string values. For example: Metadata-Version: 2.0 Name: BeagleVote Version: 1.0a2 Summary: A module for collecting votes from beagles. Keywords: dog puppy voting election Project-URL: Bug, Issue Tracker, http://bitbucket.org/tarek/distribute/issues/ Requires-Dist: pkginfo Requires-Dist: PasteDeploy Requires-Dist: zope.interface (3.5.0) Extension: Chili Chili/json: { "Type": "Poblano", "Heat": "Mild" } Apparently, these beagles like their chili. (This is not a helpful description) Would become: { "metadata-version": "2.0", "name": "BeagleVote", "version": "1.0a2", "summary": A module for collecting votes from beagles.", "description": "Apparently, these beagles like their chili. (This is not a helpful description)", "keywords": ["dog", "puppy", "voting", "election"], "project-urls": { "Bug, Issue Tracker": "http://bitbucket.org/tarek/distribute/issues/" }, "requires-dists": {"always": ["pkginfo", "PasteDeploy", "zope.interface (>3.5.0)"]}, "extensions: { "Chili": { "Type": "Poblano", "Heat": "Mild" } } } An apparently simpler alternative would be to rely on PEP 376 to retrieve the full metadata and only provide the distribution name and version to the hook: def post_install_hook(distname, current_version, previous_version=None): ... The key disadvantage of that seemingly simpler approach is it *only* works for post install and pre uninstall hooks, *and* requires that the post-install hook have the tools needed to read the PEP 376 metadata. If we later want to add pre-install, build or archiving hooks, they would need the structured metadata format anyway, as relying on PEP 376 isn't an option for software that hasn't been installed yet. This "simpler" alternative also won't work for eventually decoupling the installation database from a particular filesystem layout (e.g. adding metadata support to import hooks or tunnelling the metadata through TUF). A third alternative would be to defer the task of defining the build hook signatures and the metadata conversion to a separate metadata extension (e.g. as is going to happen for entry points). I don't think that's appropriate - the metabuild system will be the way that the distribution ecosystem evolves in the future, so it makes more sense to me to use the core metadata standard to define it. If a particular installer doesn't understand a given extension, that's not supposed to matter, whereas ignoring the post-install hook would be a *big* problem. I did consider proposing the concept of "required extensions" instead, but that really runs counter to the idea of allowing end users to use whichever standards compliant installer they prefer. However, extensions *would* be a perfect way for installers like pip to experiment with additional build hooks (e.g. bypassing setup.py for wheel creation), based on the general style of interface I am proposing for the post-install hook. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Feb 28, 2013 at 12:54 AM, Nick Coghlan
On Thu, Feb 28, 2013 at 7:59 AM, Daniel Holth
wrote: My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done.
How does the following idea sound?
New metadata field: "Post-Install" Format: a *single* callable reference in entry-points format (i.e. "module.name:callable.name") Call signature:
def post_install_hook(metadata, extras, previous_version=None): ...
"extras" would be a tuple indicating which extras were installed.
For an upgrade, "previous_version" would be set to the version that was previously installed. For a clean installation, it would either be None or omitted entirely.
The "metadata" argument would be the PEP 426 metadata, reformatted as JSON-compatible structured metadata. I had planned to postpone defining the algorithm for that conversion until after PEP 426 acceptance, but if we're going to add a post-install hook mechanism to PEP 426, I think it makes more sense to define it up front:
1. The top level is a mapping, with lowercase versions of all PEP 426 fields as keys. All multiple-use fields other than "requires-python" are pluralised (that one is only multiple use so you can depend on a different version of Python given different environment markers - for example, supporting Python 2.6 everywhere, but requiring Python 2.7 on Windows. Aside from those cases, you can collapse an arbitrarily complex version specifier down to a single line) 3. Every mandatory field is present, with a string value 4. If present, the "keywords" field, references a list of keywords (created via str.split) 5. If present, the description is always stored under the "description" key, even if provided in the PEP 426 metadata payload 6. If any other optional field is present, it references a string value 7. If present, the "project-urls" key references a mapping of labels to URLs. 8. If present, the "extensions" key references a mapping of extension names to the extension's embedded JSON metadata. (Note: this is the key reason for my planned change to the extension format from arbitrary subfields to allowing only a single "json" subfield - it greatly simplifies this aspect of the translation to structured metadata, *and* makes it more flexible and powerful at the same time) 9. For any multi-use field that is present and supports environment markers, it is a reference to a mapping where each key is a whitespace-normalized (i.e. every sequence of whitespace converted to a single space) environment marker string that references a list of string values. The unqualified fields are referenced by the string "always". This breakdown allows each unique environment marker to be evaluated only once to determine whether or not it is applicable, regardless of how many times it was originally used. 10. If any other multi-use field is present, it references a list of string values.
For example:
Metadata-Version: 2.0 Name: BeagleVote Version: 1.0a2 Summary: A module for collecting votes from beagles. Keywords: dog puppy voting election Project-URL: Bug, Issue Tracker, http://bitbucket.org/tarek/distribute/issues/ Requires-Dist: pkginfo Requires-Dist: PasteDeploy Requires-Dist: zope.interface (3.5.0) Extension: Chili Chili/json: { "Type": "Poblano", "Heat": "Mild" }
Apparently, these beagles like their chili. (This is not a helpful description)
Would become:
{ "metadata-version": "2.0", "name": "BeagleVote", "version": "1.0a2", "summary": A module for collecting votes from beagles.", "description": "Apparently, these beagles like their chili. (This is not a helpful description)", "keywords": ["dog", "puppy", "voting", "election"], "project-urls": { "Bug, Issue Tracker": "http://bitbucket.org/tarek/distribute/issues/" }, "requires-dists": {"always": ["pkginfo", "PasteDeploy", "zope.interface (>3.5.0)"]}, "extensions: { "Chili": { "Type": "Poblano", "Heat": "Mild" } } }
An apparently simpler alternative would be to rely on PEP 376 to retrieve the full metadata and only provide the distribution name and version to the hook:
def post_install_hook(distname, current_version, previous_version=None): ...
The key disadvantage of that seemingly simpler approach is it *only* works for post install and pre uninstall hooks, *and* requires that the post-install hook have the tools needed to read the PEP 376 metadata. If we later want to add pre-install, build or archiving hooks, they would need the structured metadata format anyway, as relying on PEP 376 isn't an option for software that hasn't been installed yet. This "simpler" alternative also won't work for eventually decoupling the installation database from a particular filesystem layout (e.g. adding metadata support to import hooks or tunnelling the metadata through TUF).
A third alternative would be to defer the task of defining the build hook signatures and the metadata conversion to a separate metadata extension (e.g. as is going to happen for entry points). I don't think that's appropriate - the metabuild system will be the way that the distribution ecosystem evolves in the future, so it makes more sense to me to use the core metadata standard to define it. If a particular installer doesn't understand a given extension, that's not supposed to matter, whereas ignoring the post-install hook would be a *big* problem. I did consider proposing the concept of "required extensions" instead, but that really runs counter to the idea of allowing end users to use whichever standards compliant installer they prefer.
However, extensions *would* be a perfect way for installers like pip to experiment with additional build hooks (e.g. bypassing setup.py for wheel creation), based on the general style of interface I am proposing for the post-install hook.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
We will probably wind up with some JSON very much like that. I like just exposing it as an ordered multidict with the same key names as mentioned in the PEP. IMO the environment marker for "always" is just "" (empty string). My hook would be a literal Entry-Point. You would install a package "twisted.plugins" that would register its interest in installation changes by declaring the entry point "[packaging.hooks] post_install=twisted.plugins:hook". Afterwards, every time you install or uninstall another package, twisted.plugins.hook() would be called. It would iterate over all installed distributions using some API like pkg_resources.working_set or distlib's database and do whatever it needed to do. It could be called once per pip invocation instead of once per individual package. The hook is not guaranteed to run. If you do not run the hook, you should expect Twisted's plugin discovery process to take longer just like it does today. In fact the packages available on sys.path are not guaranteed to "have been installed" at all. For comparison in the wheel patch we call pkg_resources.find_distributions(location) against the per-dist temporary location pip uses for builds. The call yields the one dist we are considering as a Distribution() object and then it's easy to get the requirements. https://github.com/pypa/pip/blob/wheel/pip/req.py#L1078 It could turn into a very long discussion but I think import hooks have to grow a public listdir() someday... http://hg.python.org/cpython/file/2.7/Lib/pkgutil.py#l331 shows that the current method is to use the less-than-ideal API of zipimport._zip_directory_cache
On Fri, Mar 1, 2013 at 12:00 AM, Daniel Holth
We will probably wind up with some JSON very much like that. I like just exposing it as an ordered multidict with the same key names as mentioned in the PEP.
A multidict is not really JSON-compatible - making sure there's an unambiguous mapping to an ordinary dictionary is highly desirable. Also, it's handy to pre-split and group the entries conditioned on the environment markers.
IMO the environment marker for "always" is just "" (empty string).
I initially had that, but it looked weird in the case where there weren't any conditional entries, and it also looks weird when accessing the data structure. By contrast, "always" is a self-describing key.
My hook would be a literal Entry-Point. You would install a package "twisted.plugins" that would register its interest in installation changes by declaring the entry point "[packaging.hooks] post_install=twisted.plugins:hook". Afterwards, every time you install or uninstall another package, twisted.plugins.hook() would be called. It would iterate over all installed distributions using some API like pkg_resources.working_set or distlib's database and do whatever it needed to do. It could be called once per pip invocation instead of once per individual package.
The hook is not guaranteed to run. If you do not run the hook, you should expect Twisted's plugin discovery process to take longer just like it does today. In fact the packages available on sys.path are not guaranteed to "have been installed" at all.
This is *not* the same kind of hook at all. The proposed hook is only run when *Twisted* is installed to replace some current legitimate customisation of "./setup.py install" behaviour, not when an arbitrary package is installed to let Twisted know about it. Your suggestion would indeed be more appropriately part of an installer-specific entry point (but one made much easier by the standard including an algorithm for conversion to structured metadata). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Feb 28, 2013 at 10:04 AM, Nick Coghlan
On Fri, Mar 1, 2013 at 12:00 AM, Daniel Holth
wrote: We will probably wind up with some JSON very much like that. I like just exposing it as an ordered multidict with the same key names as mentioned in the PEP.
A multidict is not really JSON-compatible - making sure there's an unambiguous mapping to an ordinary dictionary is highly desirable. Also, it's handy to pre-split and group the entries conditioned on the environment markers.
Sure, nothing wrong with it. Just don't bother pluralizing the names. "Goose: gander" becomes "geese" : {} no thanks.
IMO the environment marker for "always" is just "" (empty string).
I initially had that, but it looked weird in the case where there weren't any conditional entries, and it also looks weird when accessing the data structure. By contrast, "always" is a self-describing key.
Or True, or an environment-marker tautology...
My hook would be a literal Entry-Point. You would install a package "twisted.plugins" that would register its interest in installation changes by declaring the entry point "[packaging.hooks] post_install=twisted.plugins:hook". Afterwards, every time you install or uninstall another package, twisted.plugins.hook() would be called. It would iterate over all installed distributions using some API like pkg_resources.working_set or distlib's database and do whatever it needed to do. It could be called once per pip invocation instead of once per individual package.
The hook is not guaranteed to run. If you do not run the hook, you should expect Twisted's plugin discovery process to take longer just like it does today. In fact the packages available on sys.path are not guaranteed to "have been installed" at all.
This is *not* the same kind of hook at all. The proposed hook is only
That is why this conversation has been so confusing :-)
run when *Twisted* is installed to replace some current legitimate customisation of "./setup.py install" behaviour, not when an arbitrary package is installed to let Twisted know about it. Your suggestion would indeed be more appropriately part of an installer-specific entry point (but one made much easier by the standard including an algorithm for conversion to structured metadata).
participants (7)
-
Daniel Holth
-
Donald Stufft
-
Glyph
-
Nick Coghlan
-
Paul Moore
-
PJ Eby
-
Vinay Sajip