PEP 376 - Open questions
As promised, here are some open questions on PEP 376. - Will the public API names be changed from *egginfo* to *metadata*? - What precisely are the use cases for absolute path names? Concrete examples are needed. With the current spec, some things can go wrong (e.g., see below), so we need real use cases to know how to address this. - How will bdist_wininst/bdist_msi/bdist_rpm be updated? - How will the RECORD file be managed? (Particularly for the case of bdist_xxx) [1] - Can distutils be made to install files in places the current RECORD file spec can't handle? (I think the answer is "yes"). What happens then? - Should distribution names be case insensitive on case insensitive filesystems? For comparison, module/package names are always case sensitive even on case insensitive systems. - What will happen with the md5 hash? Are more types of hash going to be supported? What's the default? (Actually, the PEP doesn't need to care about the default, as the PEP says nothing about how RECORD files are written). [1] Note - the idea of using $EXEC_PREFIX / $PREFIX implies that the RECORD file is intended to be relocatable. Which is worrying, and also implies that an individual Distribution class must be able to handle filesystem files as well as whatever else it handles (consider mylib.zip on sys.path, containing a distribution which installed some files in $PREFIX). If this isn't possible, it should be clearly stated that it isn't possible. If it is, the ramifications are complex... I'm still unsure how the local vs relative, slash-separated filename formats should be handled. I don't actually think there's any real benefit in having 2 formats. I propose: - get_installed_files(), uses(), get_file_users() - always use local format absolute pathnames (for zipfiles and the like, these may not be "real" filenames, but they will be in "real filename" format, so other code will be able to manipulate them as filenames). - get_egginfo_files, get_egginfo_file - always use slash-separated forms, relative to the egginfo directory (so the name of the RECORD file is just 'RECORD') But there could be uses I haven't thought about, so this still counts as an open question at the moment (i.e., I'm reluctant to implement things this way until I've had some feedback). Paul.
(I cancelled sending this the first time, so apologies if a half-written version turns up) Paul Moore wrote:
As promised, here are some open questions on PEP 376.
I'd add one more question to the list: is allowing backslash separated names in the RECORD file actually a good idea, or would it be better to always use forward slashes? My reason for this question is what happens if (for example) a bdist_* installer is generated on Linux and then used on Windows or vice-versa? If the expected RECORD format is different on the two platforms, won't it get things wrong? For the other questions, I don't have anything much to add to PJE's comments, except that the "all relative" paths idea won't work due to the Windows drive letter issue (i.e. if an installer puts files in C:\Program Files, there is no guarantee that a relative path between site-packages and Program Files even exists if Python is installed on a different drive). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
2009/7/6 Nick Coghlan
I'd add one more question to the list: is allowing backslash separated names in the RECORD file actually a good idea, or would it be better to always use forward slashes?
They do always use forward slashes.
For the other questions, I don't have anything much to add to PJE's comments, except that the "all relative" paths idea won't work due to the Windows drive letter issue (i.e. if an installer puts files in C:\Program Files, there is no guarantee that a relative path between site-packages and Program Files even exists if Python is installed on a different drive).
The big question, though, is can an installer actually *do* that in practical terms? - There are *no* guaranteed absolute locations on Windows, so any such oddly-located file would require user interaction to work. Certainly bdist_wininst and bdist_msi don't do that. - My experiments indicate that bdist_{wininst,msi} are broken with respect to absolute paths anyway: they do a --root install to a temporary directory (and the absolute paths don't end up in there) and then package up that temporary directory. I still want to see a real life example that demonstrates that there is a genuine issue here. We're spending a lot of energy and complexity trying to design a solution to a problem that actually doesn't appear to exist in practice... (To be honest, I'd be fairly confident in saying that absolute paths can be ignored on Windows, subject to some corner cases that I haven't thought through yet. My worry is that I don't know what Unix and Mac users might do, so I can't just wish away the issue because it can't arise on Windows. Can a Unix/Mac user offer a real-world example on their own system?) Paul.
2009/7/7 Paul Moore
2009/7/6 Nick Coghlan
: I'd add one more question to the list: is allowing backslash separated names in the RECORD file actually a good idea, or would it be better to always use forward slashes?
They do always use forward slashes.
For the other questions, I don't have anything much to add to PJE's comments, except that the "all relative" paths idea won't work due to the Windows drive letter issue (i.e. if an installer puts files in C:\Program Files, there is no guarantee that a relative path between site-packages and Program Files even exists if Python is installed on a different drive).
The big question, though, is can an installer actually *do* that in practical terms?
- There are *no* guaranteed absolute locations on Windows, so any such oddly-located file would require user interaction to work. Certainly bdist_wininst and bdist_msi don't do that. - My experiments indicate that bdist_{wininst,msi} are broken with respect to absolute paths anyway: they do a --root install to a temporary directory (and the absolute paths don't end up in there) and then package up that temporary directory.
yes that's unfortunately the case for all windows-based installation. wether it's a bdist call to the install command, to create a binary package, wether it's an installation. c:\something or d:\something will be installed in sys.prefix\something. I will add an issue for distutils for this, probably ending up raising an exception when we hit this case, because I don't see how these paths can work. Unless we define a "drive that contains the python installation" maybe, or the "Program Files" directory would that make sense from a win32 point of view ?
I still want to see a real life example that demonstrates that there is a genuine issue here. We're spending a lot of energy and complexity trying to design a solution to a problem that actually doesn't appear to exist in practice...
(To be honest, I'd be fairly confident in saying that absolute paths can be ignored on Windows, subject to some corner cases that I haven't thought through yet. My worry is that I don't know what Unix and Mac users might do, so I can't just wish away the issue because it can't arise on Windows. Can a Unix/Mac user offer a real-world example on their own system?)
I know some people are writing to /etc to add their configuration file on the system, So a real-world example under linux would be: setup(..., data_files=[('/etc', ['myconf.cfg'])], ...) That is basically how the examples are shown at: http://docs.python.org/distutils/setupscript.html#installing-additional-file... But this is already os-specific, and exists because distutils doesn't have a way (yet) to express systems locations independantly from their physical location, like what the RPM system does with %VARIABLES. So another way to handle this maybe, like I have added with $PREFIX and $EXEC_PREFIX would be to nominate a list of variables that every python environment has (querying modules like sys) and let the developers use them as root locations for some files. - sys.prefix - sys.exec_prefix - some elements returned by distutils.sysconfig.get_config_vars() (distutils.sysconfig queries the python Makefile amongst others) - ... Semi-related: distutils.sysconfig should be removed from distutils, and be a standalone module in the sdtlib for example. Regards Tarek
2009/7/7 Tarek Ziadé: > Unless we define a "drive that contains the python installation" maybe, or > the "Program Files" directory > > would that make sense from a win32 point of view ? I can't imagine that it would be useful in practice. > I know some people are writing to /etc to add their configuration file > on the system, > > So a real-world example under linux would be: > > setup(..., data_files=[('/etc', ['myconf.cfg'])], ...) > > > That is basically how the examples are shown at: > > http://docs.python.org/distutils/setupscript.html#installing-additional-files Thanks. Yes, that makes for a good example. > But this is already os-specific, and exists because distutils doesn't > have a way (yet) > to express systems locations independantly from their physical location, > like what the RPM system does with %VARIABLES. > > So another way to handle this maybe, like I have added with $PREFIX > and $EXEC_PREFIX would be to nominate a list of variables that every > python environment has (querying modules like sys) and let the developers > use them as root locations for some files. > > - sys.prefix > - sys.exec_prefix > - some elements returned by distutils.sysconfig.get_config_vars() > (distutils.sysconfig queries the python Makefile amongst others) > - ... This adds extra complexity to the RECORD format, for little practical benefit that I can see. >From the POV of core distutils: - Windows users use bdist_wininst and/or bdist_msi, and absolute locations are a lost cause. - Presumably, some people use bdist_rpm (I don't know if there are other ways of creating RPMs). - Everyone else uses setup.py install or a 3rd party tool. - PEP 302 style loaders aren't relevant as the core only uses the filesystem (not even zip files). - Only 3rd party tools will consume this data. So, we need input from developers of 3rd party tools here. Phillip has stated the case for setuptools, from his POV having everything relative to the "install location" which is stored elsewhere (in the installer file) is fine. I'd like to know whether he needs "upwards-pointing" relative paths like ../../../../xx.py, but that's a small detail. So - do any other potential users of the PEP 376 metadata want to speak up? At the moment, it feels like we're designing things more or less in a vacuum. Paul.
On Tue, Jul 7, 2009 at 10:33 AM, Paul Moore
2009/7/7 Tarek Ziadé
: Unless we define a "drive that contains the python installation" maybe, or the "Program Files" directory
would that make sense from a win32 point of view ?
I can't imagine that it would be useful in practice.
I know some people are writing to /etc to add their configuration file on the system,
So a real-world example under linux would be:
setup(..., data_files=[('/etc', ['myconf.cfg'])], ...)
That is basically how the examples are shown at:
http://docs.python.org/distutils/setupscript.html#installing-additional-file...
Thanks. Yes, that makes for a good example.
But this is already os-specific, and exists because distutils doesn't have a way (yet) to express systems locations independantly from their physical location, like what the RPM system does with %VARIABLES.
So another way to handle this maybe, like I have added with $PREFIX and $EXEC_PREFIX would be to nominate a list of variables that every python environment has (querying modules like sys) and let the developers use them as root locations for some files.
- sys.prefix - sys.exec_prefix - some elements returned by distutils.sysconfig.get_config_vars() (distutils.sysconfig queries the python Makefile amongst others) - ...
This adds extra complexity to the RECORD format, for little practical benefit that I can see.
From the POV of core distutils: - Windows users use bdist_wininst and/or bdist_msi, and absolute locations are a lost cause. - Presumably, some people use bdist_rpm (I don't know if there are other ways of creating RPMs).
They are othe ways to generate RPMs. There are also debian packaging scripts
- Everyone else uses setup.py install or a 3rd party tool. - PEP 302 style loaders aren't relevant as the core only uses the filesystem (not even zip files). - Only 3rd party tools will consume this data.
So, we need input from developers of 3rd party tools here. Phillip has stated the case for setuptools, from his POV having everything relative to the "install location" which is stored elsewhere (in the installer file) is fine. I'd like to know whether he needs "upwards-pointing" relative paths like ../../../../xx.py, but that's a small detail.
So - do any other potential users of the PEP 376 metadata want to speak up?
At the moment, it feels like we're designing things more or less in a vacuum.
I am CC'ing the people that worked with us for the versionning matters, they can speak from a Fedora and Ubuntu POV, I am also adding Jim for zc.buildout, he can provide precious input. (I am sorry if some people get the mail twice)
Paul Moore wrote:
I know some people are writing to /etc to add their configuration file on the system,
So a real-world example under linux would be:
setup(..., data_files=[('/etc', ['myconf.cfg'])], ...)
That is basically how the examples are shown at:
http://docs.python.org/distutils/setupscript.html#installing-additional-file...
Thanks. Yes, that makes for a good example.
But this is already os-specific, and exists because distutils doesn't have a way (yet) to express systems locations independantly from their physical location, like what the RPM system does with %VARIABLES.
So another way to handle this maybe, like I have added with $PREFIX and $EXEC_PREFIX would be to nominate a list of variables that every python environment has (querying modules like sys) and let the developers use them as root locations for some files.
I think you have to differentiate between packages and applications. Packages will usually have their own way of getting configured, either by passing parameters via some API or pointing the package to a configuration file. They may come with some example config files, but should normally don't interfere with the system configuration. I.e. putting a Python package configuration into /etc does not really sound like a good idea. Applications tend to ship everything needed to run the application together with the installer and typically use a system-dependent installer rather than a distutils based approach. These then place config files in the usual default dirs of the system. This is out-of-scope for PEP 376. Then you have tools like zc.buildout which basically build an application in some directory at "install" time. Since only these tools know what they are doing, the whole "uninstall" mechanism also lies in their hands. Again, PEP 376 would only apply to the dynamic package installation part, but not to the complete application build. Another aspect to consider is that config files should normally not be uninstalled automatically. The user should either be asked whether she wants to keep the files or the uninstaller should leave them untouched and issue a warning that certain files were not uninstalled. Summarizing, I think it's better not to record config files and other user-edited files in the RECORD file. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 07 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
2009/7/7 M.-A. Lemburg
I think you have to differentiate between packages and applications.
Agreed. I believe that only packages should be considered here. Applications are the focus of tools like py2exe on Windows, and (AIUI) things like workingenv. These tools should (will) have their own approaches. The only cases I know where there is reason for a package to store paths outside the package directory are: - executable scripts, which go in sys.prefix/Scripts on Windows, and ??? on Unix/MAC OS - supporting files (MoinMoin puts its HTML documents etc in sys.prefix/share on Windows, cx_Oracle puts its documentation in sys.prefix/cx_Oracle-doc) Executable files could probably be superseded by using "python -m", but compatibility and users' preference for having a "real script" probably means they aren't going away in the near future. Support files are getting put in the package directory more often these days, but stuff that needs to be found by non-Python tools is arguably still better in a more discoverable location. For Windows, having a few distinguished locations under the installation root (scripts, share, doc) would probably do. For Unix and Mac OS, I have no opinion (but I suspect that absolute paths like /usr/local/bin might be the norm there). [...]
Summarizing, I think it's better not to record config files and other user-edited files in the RECORD file.
The RECORD file should contain precisely those files that are created as part of the install. That's ultimately the point of the file (for ownership queries and uninstallation). Hmm, on the other hand, if foo.py is in the RECORD file, the uninstaller should uninstall foo.pyc and foo.pyo as well. And a query as to whether the distribution owns foo.pyc should return True. How will this be handled? Paul.
Paul Moore
The only cases I know where there is reason for a package to store paths outside the package directory are:
I think the *only* files that actually belong in the package directory are the Python modules inside that package. Other files need to be easily, and automatically, separable for purposes of installation.
- executable scripts, which go in sys.prefix/Scripts on Windows
os.path.join(sys.prefix, 'Scripts')
and ??? on Unix/MAC OS
Depending on whether the developer designates the programs for sysadmin-only use or not: os.path.join(sys.prefix, 'bin') os.path.join(sys.prefix, 'sbin')
- supporting files (MoinMoin puts its HTML documents etc in sys.prefix/share on Windows, cx_Oracle puts its documentation in sys.prefix/cx_Oracle-doc)
This category, it seems to me, needs to be expanded with metadata that allows “purpose”-based tagging, so that platform-specific standards can be applied using those purposes to determine the correct location for these non-Python-module files.
Support files are getting put in the package directory more often these days
Only, IMO, because there's no way of flagging it as anything but arbitrary “data”. Examples of purpose-based classifications that need to be distinctly declarable: executable program, importable module source, platform-independent compiled module, platform-dependent compiled module, documentation, run-time variable data, static data, configuration, and so on. All of these (and others I've forgotten) should be possible for the developer to declare in distribution metadata, and the installer can then use those declarations to make the files go to platform-specific locations, not the Python package directories.
For Windows, having a few distinguished locations under the installation root (scripts, share, doc) would probably do. For Unix and Mac OS, I have no opinion (but I suspect that absolute paths like /usr/local/bin might be the norm there).
The lines are drawn in different places on each platform; we don't want to have the union of all these different platform-specific locations in the standard, and likewise we don't want to leave any of them with second-class support. I think it's not the developer's burden to decide *where* such files go; rather, they should be declaring only the *purpose* of these files in the distribution metadata, and it's up to the site-specific installer (possibly as configured by the installing user) to decide the location of each file by its declared purpose. -- \ “I can picture in my mind a world without war, a world without | `\ hate. And I can picture us attacking that world, because they'd | _o__) never expect it.” —Jack Handey | Ben Finney
2009/7/7 Ben Finney
I think it's not the developer's burden to decide *where* such files go; rather, they should be declaring only the *purpose* of these files in the distribution metadata, and it's up to the site-specific installer (possibly as configured by the installing user) to decide the location of each file by its declared purpose.
That's a whole different PEP, though. Paul.
On Tue, 7 Jul 2009 at 13:05, Paul Moore wrote:
2009/7/7 Ben Finney
: [... lots of interesting stuff deleted ...] I think it's not the developer's burden to decide *where* such files go; rather, they should be declaring only the *purpose* of these files in the distribution metadata, and it's up to the site-specific installer (possibly as configured by the installing user) to decide the location of each file by its declared purpose.
That's a whole different PEP, though.
Which one? It seems to me that supporting this is implicit in the language summit goals of (1) having distutils be better support infrastructure for system packaging utilities and (2) needing a way to deal with resource files "that might be installed in a specific place on the target system by the system packager". I'll grant that I'm reading between the lines, it isn't an explicitly stated goal. But it was the direction my mind went when I read Tarek's notes, given that the first stated goal is "standardize more metadata". But I'm not one of the people involved in system packaging tools, so I'll leave it to them to say how useful/important this is. --David
On Tue, Jul 7, 2009 at 2:52 PM, R. David Murray
On Tue, 7 Jul 2009 at 13:05, Paul Moore wrote:
2009/7/7 Ben Finney
: [... lots of interesting stuff deleted ...] I think it's not the developer's burden to decide *where* such files go; rather, they should be declaring only the *purpose* of these files in the distribution metadata, and it's up to the site-specific installer (possibly as configured by the installing user) to decide the location of each file by its declared purpose.
That's a whole different PEP, though.
Which one? It seems to me that supporting this is implicit in the language summit goals of (1) having distutils be better support infrastructure for system packaging utilities and (2) needing a way to deal with resource files "that might be installed in a specific place on the target system by the system packager". I'll grant that I'm reading between the lines, it isn't an explicitly stated goal. But it was the direction my mind went when I read Tarek's notes, given that the first stated goal is "standardize more metadata".
Yes but the topic is so wide that it has to be cut in several PEP, and things have to be done gradually So far: - PEP 376 : standard for the metadata format and location + query APIs - PEP 345 : standard for the metadata *content* - work in progress too (there's a branch with new fields waiting) - PEP 386 : standard for version comparisons topics that are not yet in PEP are grouped on the wiki page (under "current work") with notes : http://wiki.python.org/moin/Distutils When I started to work on this I didn't realize the gigantic amount of work and coordination it requires, and I do understand now the current state. At first I was trying to coordinate interested people to work on each topic mentioned there in parallel. (like we did a bit after the summit) But at the end, it seems that having everyone interested in packaging matters focusing on the less number of possible topics pays more. PEP 376 is just a piece of the puzzle but I am confident it will speed up other tasks since it raises our common ground knowledge.
2009/7/7 Paul Moore
The RECORD file should contain precisely those files that are created as part of the install. That's ultimately the point of the file (for ownership queries and uninstallation).
Hmm, on the other hand, if foo.py is in the RECORD file, the uninstaller should uninstall foo.pyc and foo.pyo as well. And a query as to whether the distribution owns foo.pyc should return True. How will this be handled?
It's planned to list them as well in RECORD, since install calls a sub command that build thems (install_lib), So the same rules apply than the .py ones. But there's a special case : if the --no-compile or the --no-optimize option is used, then the pyc|pyo files are not added. Which means they will be created afterwards when the module is used on the target system. So the pyc|pyo files could be removed when they are present besides the py file that is being removed. Although, it will still be required to write them in the RECORD file by the install command when the --no-compile or the --no-optimize options are *not* used. So they are properly detected and removed when the py files are not distributed in binary distributions, but just pyc files.
Paul Moore wrote:
2009/7/6 Nick Coghlan
: - There are *no* guaranteed absolute locations on Windows, so any such oddly-located file would require user interaction to work. Certainly bdist_wininst and bdist_msi don't do that. - My experiments indicate that bdist_{wininst,msi} are broken with respect to absolute paths anyway: they do a --root install to a temporary directory (and the absolute paths don't end up in there) and then package up that temporary directory. I still want to see a real life example that demonstrates that there is a genuine issue here. We're spending a lot of energy and complexity trying to design a solution to a problem that actually doesn't appear to exist in practice...
(To be honest, I'd be fairly confident in saying that absolute paths can be ignored on Windows, subject to some corner cases that I haven't thought through yet. My worry is that I don't know what Unix and Mac users might do, so I can't just wish away the issue because it can't arise on Windows. Can a Unix/Mac user offer a real-world example on their own system?)
I thought installing pywin32 based COM objects still involved messing with the Windows directory, but MS may have improved that in more recent OS versions. It's been years since I played with win32com objects, and even then it was just idle experimentation that didn't get very far so I could easily be misremembering. For *nix, the obvious use case is installing scripts somewhere like /usr/bin or /usr/local/bin. One option is to punt on this whole issue and say if people want to install stuff outside the Python module heirarchy they should create their own OS-specific package to manage it (i.e. leave the non-relative paths to be managed by APT or a Windows installer or whatever). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
2009/7/7 Nick Coghlan
For *nix, the obvious use case is installing scripts somewhere like /usr/bin or /usr/local/bin.
Using distutils' scripts option, they will end up in : sys.exec_path/bin Another use case I've found in a distro I've installed this afternoon : setup(..., data_files=[('/usr/share/man/man1/', ['SOMEFILE'])], ...) That's not the most elegant way to add a man page, but for someone who doesn't bather with APT or whatever, it works to build a binary distribution.
One option is to punt on this whole issue and say if people want to install stuff outside the Python module heirarchy they should create their own OS-specific package to manage it (i.e. leave the non-relative paths to be managed by APT or a Windows installer or whatever).
If so, what do we do with the "data_files" option in distutils ? If it's used with absolute paths, files can be installed anywhere on the system, and we want to track them. Even if we don't uninstall them automatically, they should be tracked so a third-party uninstaller can deal with them properly. Or do we change this distutils feature and state that the directories used in "data_files" will always be relative to sys.prefix ? That would bring us back to three cases in the RECORD: - files located under sys.prefix, but not located under site-packages - files located under sys.exec_prefix, but not located under site-packages - files located under site-packages Where "site-packages" is the directory that contains .egg-info directory of the distribution (that's basically the current PEP state, beside the absolute paths case we would need to remove)
On 6 Jul, 2009, at 20:38, Paul Moore wrote:
- Should distribution names be case insensitive on case insensitive filesystems? For comparison, module/package names are always case sensitive even on case insensitive systems.
I'd then go for case sensitive names for distributions as well, just to be consistent. But isn't the actual name in the PKG-INFO file anyway? Do you mean really case insensive filesystems like DOS' FAT, or case preserving (HFS+ on OSX, NTFS on Windows)? For the latter you can at least get the original name back using os.readdir. BTW. Actually determining if you are working with a case-sensitive fileystem requires filesystem access, os.path.normcase is hopelessly naieve in that respect. I'm reguarly dealing with NFS mounted filesystems on OSX systems, which means that parts of a path are case preserving (the HFS+ filesystem upto the mount point) and other parts are truly case sensitive (paths on the NFS mounted filesystem originating on a Linux server). Ronald
participants (7)
-
Ben Finney
-
M.-A. Lemburg
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Ronald Oussoren
-
Tarek Ziadé