
I think this PEP is a significant improvement from its predecessor. It represents features like extras (provides-extra) and build requirements (setup-requires-dist) that are in use in the Python community but cannot be represented in older versions of the format, it finally specifies a UTF-8 encoding, removes RFC 822, provides an extension mechanism, and allows the description to be placed in the document payload. PEP 426 doesn't have anything to do with the Wheel PEPs 425 and 427, other than that its features are necessary to usefully represent a large number of existing Python packages. How about moving this one along to focus on the other two. I'm not sure what the Post-History should be. We have been talking about it for a while. Thanks, Daniel Holth PEP: 426 Title: Metadata for Python Software Packages 1.3 Version: $Revision$ Last-Modified: $Date$ Author: Daniel Holth <dholth@fastmail.fm> Discussions-To: Distutils SIG Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30 Aug 2012 Abstract ======== This PEP describes a mechanism for adding metadata to Python distributions. It includes specifics of the field names, and their semantics and usage. This document specifies version 1.3 of the metadata format. Version 1.0 is specified in PEP 241. Version 1.1 is specified in PEP 314. Version 1.2 is specified in PEP 345. Version 1.3 of the metadata format adds fields designed to make third-party packaging of Python Software easier and defines a formal extension mechanism. The fields are "Setup-Requires-Dist" "Provides-Extra", and "Extension". This version also adds the `extra` variable to the `environment markers` specification and allows the description to be placed into a payload section. Metadata Files ============== The syntax defined in this PEP is for use with Python distribution metadata files. The file format is a simple UTF-8 encoded Key: value format with case-insensitive keys and no maximum line length, followed by a blank line and an arbitrary payload. It is parseable by the ``email`` module with an appropriate ``email.policy.Policy()``. When ``metadata`` is a Unicode string, ```email.parser.Parser().parsestr(metadata)`` is a serviceable parser. There are two standard locations for these metadata files: * the ``PKG-INFO`` file included in the base directory of Python source distribution archives (as created by the distutils ``sdist`` command) * the ``.dist-info/METADATA`` files in a Python installation database, as described in PEP 376. Other tools involved in Python distribution may also use this format. Encoding ======== Metadata 1.3 files are UTF-8 with the restriction that keys must be ASCII. Parser implementations should be aware that older versions of the Metadata specification do not specify an encoding. Fields ====== This section specifies the names and semantics of each of the supported metadata fields. In a single Metadata 1.3 file, fields marked with "(optional)" may occur 0 or 1 times. Fields marked with "(multiple use)" may be specified 0, 1 or more times. Only "Metadata-Version", "Name", "Version", and "Summary" must appear exactly once. The fields may appear in any order within the file. Metadata-Version :::::::::::::::: Version of the file format; "1.3" is the only legal value. Example:: Metadata-Version: 1.3 Name :::: The name of the distribution. Example:: Name: BeagleVote Version ::::::: A string containing the distribution's version number. This field must be in the format specified in PEP 386. Example:: Version: 1.0a2 Summary ::::::: A one-line summary of what the distribution does. Example:: Summary: A module for collecting votes from beagles. Platform (multiple use) ::::::::::::::::::::::: A Platform specification describing an operating system supported by the distribution which is not listed in the "Operating System" Trove classifiers. See "Classifier" below. Examples:: Platform: ObscureUnix Platform: RareDOS Supported-Platform (multiple use) ::::::::::::::::::::::::::::::::: Binary distributions containing a metadata file will use the Supported-Platform field in their metadata to specify the OS and CPU for which the binary distribution was compiled. The semantics of the Supported-Platform field are not specified in this PEP. Example:: Supported-Platform: RedHat 7.2 Supported-Platform: i386-win32-2791 Description (optional, deprecated) :::::::::::::::::::::::::::::::::: A longer description of the distribution that can run to several paragraphs. Software that deals with metadata should not assume any maximum size for this field. The contents of this field can be written using reStructuredText markup [1]_. For programs that work with the metadata, supporting markup is optional; programs can also display the contents of the field as-is. This means that authors should be conservative in the markup they use. Since a line separator immediately followed by another line separator indicates the end of the headers section, any line separators in the description must be suffixed by whitespace to indicate continuation. Since Metadata 1.3 the recommended place for the description is in the payload section of the document, after the last header. The description does not need to be reformatted when it is included in the payload. Keywords (optional) ::::::::::::::::::: A list of additional keywords to be used to assist searching for the distribution in a larger catalog. Example:: Keywords: dog puppy voting election Home-page (optional) :::::::::::::::::::: A string containing the URL for the distribution's home page. Example:: Home-page: http://www.example.com/~cschultz/bvote/ Download-URL (optional) ::::::::::::::::::::::: A string containing the URL from which this version of the distribution can be downloaded. (This means that the URL can't be something like ".../BeagleVote-latest.tgz", but instead must be ".../BeagleVote-0.45.tgz".) Author (optional) ::::::::::::::::: A string containing the author's name at a minimum; additional contact information may be provided. Example:: Author: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com> Author-email (optional) ::::::::::::::::::::::: A string containing the author's e-mail address. It contains a name and e-mail address in the RFC 5322 recommended ``Address Specification`` format. Example:: Author-email: "C. Schultz" <cschultz@example.com> Maintainer (optional) ::::::::::::::::::::: A string containing the maintainer's name at a minimum; additional contact information may be provided. Note that this field is intended for use when a project is being maintained by someone other than the original author: it should be omitted if it is identical to ``Author``. Example:: Maintainer: C. Schultz, Universal Features Syndicate, Los Angeles, CA <cschultz@peanuts.example.com> Maintainer-email (optional) ::::::::::::::::::::::::::: A string containing the maintainer's e-mail address. It has the same format as ``Author-email``. Note that this field is intended for use when a project is being maintained by someone other than the original author: it should be omitted if it is identical to ``Author-email``. Example:: Maintainer-email: "C. Schultz" <cschultz@example.com> License (optional) :::::::::::::::::: Text indicating the license covering the distribution where the license is not a selection from the "License" Trove classifiers. See "Classifier" below. This field may also be used to specify a particular version of a license which is named via the ``Classifier`` field, or to indicate a variation or exception to such a license. Examples:: License: This software may only be obtained by sending the author a postcard, and then the user promises not to redistribute it. License: GPL version 3, excluding DRM provisions The full text of the license would normally be included in a separate file. Classifier (multiple use) ::::::::::::::::::::::::: Each entry is a string giving a single classification value for the distribution. Classifiers are described in PEP 301 [2]. Examples:: Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console (Text Based) Requires-Dist (multiple use) :::::::::::::::::::::::::::: Each entry contains a string naming some other distutils project required by this distribution. The format of a requirement string is identical to that of a distutils project name (e.g., as found in the ``Name:`` field. optionally followed by a version declaration within parentheses. The distutils project names should correspond to names as found on the `Python Package Index`_. Version declarations must follow the rules described in `Version Specifiers`_ Examples:: Requires-Dist: pkginfo Requires-Dist: PasteDeploy Requires-Dist: zope.interface (>3.5.0) Setup-Requires-Dist (multiple use) :::::::::::::::::::::::::::::::::: Like Requires-Dist, but names dependencies needed while the distributions's distutils / packaging `setup.py` / `setup.cfg` is run. Commonly used to generate a manifest from version control. Examples:: Setup-Requires-Dist: custom_setup_command Dependencies mentioned in `Setup-Requires-Dist` may be installed exclusively for setup and are not guaranteed to be available at run time. Provides-Dist (multiple use) :::::::::::::::::::::::::::: Each entry contains a string naming a Distutils project which is contained within this distribution. This field *must* include the project identified in the ``Name`` field, followed by the version : Name (Version). A distribution may provide additional names, e.g. to indicate that multiple projects have been bundled together. For instance, source distributions of the ``ZODB`` project have historically included the ``transaction`` project, which is now available as a separate distribution. Installing such a source distribution satisfies requirements for both ``ZODB`` and ``transaction``. A distribution may also provide a "virtual" project name, which does not correspond to any separately-distributed project: such a name might be used to indicate an abstract capability which could be supplied by one of multiple projects. E.g., multiple projects might supply RDBMS bindings for use by a given ORM: each project might declare that it provides ``ORM-bindings``, allowing other projects to depend only on having at most one of them installed. A version declaration may be supplied and must follow the rules described in `Version Specifiers`_. The distribution's version number will be implied if none is specified. Examples:: Provides-Dist: OtherProject Provides-Dist: AnotherProject (3.4) Provides-Dist: virtual_package Obsoletes-Dist (multiple use) ::::::::::::::::::::::::::::: Each entry contains a string describing a distutils project's distribution which this distribution renders obsolete, meaning that the two projects should not be installed at the same time. Version declarations can be supplied. Version numbers must be in the format specified in `Version Specifiers`_. The most common use of this field will be in case a project name changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. When you install Torqued Python, the Gorgon distribution should be removed. Examples:: Obsoletes-Dist: Gorgon Obsoletes-Dist: OtherProject (<3.0) Requires-Python (optional) :::::::::::::::::::::::::: This field specifies the Python version(s) that the distribution is guaranteed to be compatible with. Version numbers must be in the format specified in `Version Specifiers`_. Examples:: Requires-Python: 2.5 Requires-Python: >2.1 Requires-Python: >=2.3.4 Requires-Python: >=2.5,<2.7 Requires-External (multiple use) :::::::::::::::::::::::::::::::: Each entry contains a string describing some dependency in the system that the distribution is to be used. This field is intended to serve as a hint to downstream project maintainers, and has no semantics which are meaningful to the ``distutils`` distribution. The format of a requirement string is a name of an external dependency, optionally followed by a version declaration within parentheses. Because they refer to non-Python software releases, version numbers for this field are **not** required to conform to the format specified in PEP 386: they should correspond to the version scheme used by the external dependency. Notice that there's is no particular rule on the strings to be used. Examples:: Requires-External: C Requires-External: libpng (>=1.5) Project-URL (multiple use) :::::::::::::::::::::::::: A string containing a label and a browsable URL for the project, separated by the last occurrence of comma and space ", ". Example:: Bug, Issue Tracker, http://bitbucket.org/tarek/distribute/issues/ The label is a free text. Provides-Extra (multiple use) ::::::::::::::::::::::::::::: A string containing the name of an optional feature. Must be printable ASCII, not containing whitespace, comma (,), or square brackets []. May be used to make a dependency conditional on whether the optional feature has been requested. Example:: Name: beaglevote Provides-Extra: pdf Requires-Dist: reportlab; extra == 'pdf' Requires-Dist: nose; extra == 'test' Requires-Dist: sphinx; extra == 'doc' A second distribution requires an optional dependency by placing it inside square brackets and can request multiple features by separating them with a comma (,). The full set of requirements is the union of the `Requires-Dist` sets evaluated with `extra` set to `None` and then to the name of each requested feature. Example:: Requires-Dist: beaglevote[pdf] -> requires beaglevote, reportlab Requires-Dist: beaglevote[test, doc] -> requires beaglevote, sphinx, nose Two feature names `test` and `doc` are reserved to mark dependencies that are needed for running automated tests and generating documentation, respectively. It is legal to specify `Provides-Extra` without referencing it in any `Requires-Dist`. It is an error to request a feature name that has not been declared with `Provides-Extra`. Extension (multiple use) :::::::::::::::::::::::: An ASCII string, not containing whitespace or the / character, that indicates the presence of extended metadata. Additional tags defined by an `Extension: Chili` must be of the form `Chili/Name`:: Extension: Chili Chili/Type: Poblano Chili/Heat: Mild An implementation might iterate over all the declared `Extension:` fields to invoke the processors for those extensions. As the order of the fields is not used, the `Extension: Chili` field may appear before or after its declared tags `Chili/Type:` etc. Version Specifiers ================== Version specifiers are a series of conditional operators and version numbers, separated by commas. Conditional operators must be one of "<", ">", "<=", ">=", "==" and "!=". Any number of conditional operators can be specified, e.g. the string ">1.0, !=1.3.4, <2.0" is a legal version declaration. The comma (",") is equivalent to the **and** operator. Each version number must be in the format specified in PEP 386. When a version is provided, it always includes all versions that starts with the same value. For example the "2.5" version of Python will include versions like "2.5.2" or "2.5.3". Pre and post releases in that case are excluded. So in our example, versions like "2.5a1" are not included when "2.5" is used. If the first version of the range is required, it has to be explicitly given. In our example, it will be "2.5.0". Notice that some projects might omit the ".0" prefix for the first release of the "2.5.x" series: - 2.5 - 2.5.1 - 2.5.2 - etc. In that case, "2.5.0" will have to be explicitly used to avoid any confusion between the "2.5" notation that represents the full range. It is a recommended practice to use schemes of the same length for a series to completely avoid this problem. Some Examples: - ``Requires-Dist: zope.interface (3.1)``: any version that starts with 3.1, excluding post or pre-releases. - ``Requires-Dist: zope.interface (==3.1)``: equivalent to ``Requires-Dist: zope.interface (3.1)``. - ``Requires-Dist: zope.interface (3.1.0)``: any version that starts with 3.1.0, excluding post or pre-releases. Since that particular project doesn't use more than 3 digits, it also means "only the 3.1.0 release". - ``Requires-Python: 3``: Any Python 3 version, no matter wich one, excluding post or pre-releases. - ``Requires-Python: >=2.6,<3``: Any version of Python 2.6 or 2.7, including post releases of 2.6, pre and post releases of 2.7. It excludes pre releases of Python 3. - ``Requires-Python: 2.6.2``: Equivalent to ">=2.6.2,<2.6.3". So this includes only Python 2.6.2. Of course, if Python was numbered with 4 digits, it would have include all versions of the 2.6.2 series. - ``Requires-Python: 2.5.0``: Equivalent to ">=2.5.0,<2.5.1". - ``Requires-Dist: zope.interface (3.1,!=3.1.3)``: any version that starts with 3.1, excluding post or pre-releases of 3.1 *and* excluding any version that starts with "3.1.3". For this particular project, this means: "any version of the 3.1 series but not 3.1.3". This is equivalent to: ">=3.1,!=3.1.3,<3.2". Environment markers =================== An **environment marker** is a marker that can be added at the end of a field after a semi-colon (";"), to add a condition about the execution environment. Here are some example of fields using such markers:: Requires-Dist: pywin32 (>1.0); sys.platform == 'win32' Obsoletes-Dist: pywin31; sys.platform == 'win32' Requires-Dist: foo (1,!=1.3); platform.machine == 'i386' Requires-Dist: bar; python_version == '2.4' or python_version == '2.5' Requires-External: libxslt; 'linux' in sys.platform The micro-language behind this is a simple subset of Python: it compares only strings, with the ``==`` and ``in`` operators (and their opposites), and with the ability to combine expressions. Parenthesis are supported for grouping. The pseudo-grammar is :: EXPR [in|==|!=|not in] EXPR [or|and] ... where ``EXPR`` belongs to any of those: - python_version = '%s.%s' % (sys.version_info[0], sys.version_info[1]) - python_full_version = sys.version.split()[0] - os.name = os.name - sys.platform = sys.platform - platform.version = platform.version() - platform.machine = platform.machine() - platform.python_implementation = platform.python_implementation() - a free string, like ``'2.4'``, or ``'win32'`` - extra = (name of requested feature) or None Notice that ``in`` is restricted to strings, meaning that it is not possible to use other sequences like tuples or lists on the right side. The fields that benefit from this marker are: - Requires-Python - Requires-External - Requires-Dist - Setup-Requires-Dist - Provides-Dist - Obsoletes-Dist - Classifier (The `extra` variable is only meaningful for Requires-Dist.) Summary of Differences From PEP 345 =================================== * Metadata-Version is now 1.3. * Values are now expected to be UTF-8. * A payload (containing the description) may appear after the headers. * Added `extra` to environment markers. * Most fields are now optional. * Changed fields: - Description - Project-URL - Requires-Dist * Added fields: - Extension - Provides-Extra - Setup-Requires-Dist References ========== This document specifies version 1.3 of the metadata format. Version 1.0 is specified in PEP 241. Version 1.1 is specified in PEP 314. Version 1.2 is specified in PEP 345. .. [1] reStructuredText markup: http://docutils.sourceforge.net/ .. _`Python Package Index`: http://pypi.python.org/pypi/ .. [2] PEP 301: http://www.python.org/dev/peps/pep-0301/ Appendix ======== Parsing and generating the Metadata 1.3 serialization format using Python 3.3:: # Metadata 1.3 demo from email.generator import Generator from email import header from email.parser import Parser from email.policy import Compat32 from email.utils import _has_surrogates class MetadataPolicy(Compat32): max_line_length = 0 continuation_whitespace = '\t' def _sanitize_header(self, name, value): if not isinstance(value, str): return value if _has_surrogates(value): raise NotImplementedError() else: return value def _fold(self, name, value, sanitize): body = ((self.linesep+self.continuation_whitespace) .join(value.splitlines())) return ''.join((name, ': ', body, self.linesep)) if __name__ == "__main__": import sys import textwrap pkg_info = """\ Metadata-Version: 1.3 Name: package Version: 0.1.0 Summary: A package. Description: Description =========== A description of the package. """ m = Parser(policy=MetadataPolicy()).parsestr(pkg_info) m['License'] = 'GPL' description = m['Description'] description_lines = description.splitlines() m.set_payload(description_lines[0] + '\n' + textwrap.dedent('\n'.join(description_lines[1:])) + '\n') del m['Description'] # Correct if sys.stdout.encoding == 'UTF-8': Generator(sys.stdout, maxheaderlen=0).flatten(m) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

On Monday, November 19, 2012 at 7:37 PM, PJ Eby wrote:
Can we maybe kill Provides-Dist and its associated baggage first, though?
I would love to kill Provides-Dist. The biggest question there is how do you handle it's functionality? If someone needs setuptools but they have distribute installed they both shouldn't get installed. The need for it for the "2 packages being distributed together" I'm (personally) less concerned about since with proper dependency data we should be able to just depend on things instead of bundling them.

The "I bundled a renamed copy of six" is a totally different case which would not invoke provides-dist. "I merged sqlalchemy with a previously separate but wildly popular declarative / database support / whatever extension" would invoke provides-dist. Daniel Holth On Nov 19, 2012, at 7:41 PM, Donald Stufft <donald.stufft@gmail.com> wrote:

We are getting along fine too. No tool parses metadata 1.x for package management reasons and provides has existed forever with no implementation. So it is not inconveniencing anyone. I would prefer to leave it alone. Daniel Holth On Nov 19, 2012, at 7:49 PM, Donald Stufft <donald.stufft@gmail.com> wrote:

So you want to leave metadata in that you think people shouldn't implement? Or you do think people should implement it and the point about it existing forever without an implementation is? At the very least there needs to be some sort of guidelines as to what to do with the field in the various states it could be in. On Monday, November 19, 2012 at 8:31 PM, Daniel Holth wrote:

On Monday, November 19, 2012 at 9:24 PM, Daniel Holth wrote:
Mostly it seems a bit silly to have so much conversations about parts of the pep that remain unchanged from previously accepted versions...
Well, I think the PEP should describe what we expect to be implemented *shrug*. Either we should expect it to be implemented and it should be part of the spec, or we shouldn't expect people to implement it and it should be removed.

The section could definitely be much clearer. How about: Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. This field *must* include the project identified in the ``Name`` field, optionally followed by the version: Name (Version). A distribution may provide additional names, e.g. to indicate that multiple projects have been merged into a single distribution or to indicate that this project is a substitute for another. For instance distribute (a fork of setuptools) could ``Provides-Dist`` setuptools to prevent the conflicting package from being downloaded and installed when distribute is already installed. A distribution that has been merged with another might ``Provides-Dist`` the obsolete name(s) to satisfy any projects that require the obsolete distribution's name.

Daniel Holth <dholth <at> gmail.com> writes:
Mostly it seems a bit silly to have so much conversations about parts of the pep that remain unchanged from previously accepted versions...
I don't agree with the suggestion that we shouldn't discuss it because it was accepted in a previous version. Perhaps it didn't receive the right scrutiny at that time, but since it hasn't been implemented, it's reasonable to discuss it. ISTM that implementing it as suggested in the PEP can lead to certain problems, since it is a multi-valued field. If it is left in, then something should be said in the PEP about the potential difficulties and if/how they can be resolved. The difficulties I am talking about relate to dependency resolution. Given the current definition of Provides-Dist, it is possible for a package A on PyPI to "Provide" all of e.g. "A (1.0)", "B (1.2)" and "C (1.5)", and it is also possible for packages B and C on PyPI to provide the same (or slightly different) versions of logical packages of A, B, and C. This will likely lead to the need for a sophisticated dependency resolver because the dependency graph can get quite convoluted. (Remember, we might need to do this resolution when removing packages as well as when installing them.) I know there are SAT solvers and such, but I'm not sure we need that level of sophistication, or whether its complexity cost is outweighed by any benefit. Remember, we are managing fine without multi-valued Provides-Dist, and while a case has been made for virtual packages and forks (which just require a single-valued field), no compelling case has been made for bundling packages in general (I understand that such requirements might sometimes arise in certain corporate environments, but they don't seem to be a mainstream use case). Hence, no strong case has been made for a multi-valued "Provides" field. If we have a good index and packaging infrastructure, there is no general need for packages to bundle other packages, unless those bundled packages are changed in some way to suit the bundler's needs. In that case, I don't know how you could be sure that a bundled "A (1.0)" hasn't diverged from the equivalent package on PyPI. The "Provides" seems essentially useless in a metadata index, since if, when asked to install D which has a dependency on A, you would download and install A to resolve it rather than B or C, and I can't see when you would want to query the index to say "who provides A?" and then use some heuristic to pick e.g. B or C, rather than A. distlib currently contains support for the multi-valued "Provides", but I'm not confident that will work as expected given pathological cases like the example I suggested, without getting "complicated" in the Zen of Python sense. I'm not convinced that the maintenance burden of a complicated solution is worth the heretofore unnecessary ability to bundle stuff in arbitrary ways. Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 9:35 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
If you don't have Provides-Dist, then distribute must continue to bundle an extra .egg-info directory to emulate the feature. This is more than enough justification for me. Name: is essentially an alias for Provides-Dist: (or vice-versa) so there is no such thing as a single-valued Provides-Dist. Having two names for a package is just as complicated as having twenty. You should not implement Provides-Dist by searching for every Provides-Dist: name on PyPI. You should only use it when deciding whether to download setuptools when distribute is already installed and a package depends on setuptools. The bundling term was bad wording on the part of the PEP. No one should ever include non-renamed copies of other dists in their dists "import six" vs. "import django.util.six". I've suggested a new wording in this thread.

Daniel Holth <dholth <at> gmail.com> writes:
I'm not so sure. In the case of two names, it could be assumed that one was a fork of the other (as in the specific cases of distribute/setuptools, or PIL/ Pillow). You cannot reasonably make this assumption if you have twenty entries in your Provides-Dist.
You should not implement Provides-Dist by searching for every Provides-Dist: name on PyPI.
I wasn't seriously suggesting that this approach be taken - merely pointing out that Provides-Dist isn't of much use in a metadata index.
So apart from the setuptools/distribute and PIL/Pillow scenarios, what are the scenarios where you would have 3 or more values in Provides-Dist? If they are e.g. a bundled SQLAlchemy, why would that be preferable to an entry in Requires-Dist? Regards, Vinay Sajip

On Wed, Nov 21, 2012 at 1:16 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Provides/Requires/Obsoletes are *not* for bundling. Publishing bundled packages on the index is bad, and people shouldn't do it. What they're for is tracking name changes over time, so that you can fork and rename and merge projects without breaking the world for people that depend on your projects (one example used in the Fedora RPM docs is the apache package being renamed to httpd: https://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html-sin... ). The fact distribute can provide setuptools and Pillow can provide PIL are examples of the simple fork/rename case - they're designed to be drop in replacements for the projects they forked, so it's appropriate for them to advertise that fact in a way the deployment tools can understand. The multi-value support is then needed if you have multiple name changes over time (e.g. if someone were to create a distribute2 that provided both distribute and setuptools), or if you merge two projects together (e.g. if a popular extension to a project was folded into the main distribution for that project). It's likely fine if an installer doesn't use sophisticated graph analysis to find the "best" way to satisfy a set of requirements - you can just as easily use it in the simple way Daniel describes of only using these fields to check for existing locally installed packages with the necessary capabilities, before going out to get whatever is missing from the package index based purely on the distribution names. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Edit the following text: Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. This field *must* include the project identified in the ``Name`` field, optionally followed by the version Name (Version). A distribution may provide additional names, e.g. to indicate that multiple projects have been merged into a single distribution or to indicate that this project is a substitute for another. For instance distribute (a fork of setuptools) could ``Provides-Dist`` setuptools to prevent the conflicting package from being downloaded and installed when distribute is already installed. A distribution that has been merged with another might ``Provides-Dist`` the obsolete name(s) to satisfy any projects that require the obsolete distribution's name.

Daniel Holth <dholth <at> gmail.com> writes:
Edit the following text:
Okay, here is a possible version: --------------------------------- Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. The entry must consist of a name and version. This name of the project identified in the ``Name`` field is implicitly considered as provided, with the version specified in the ``Version`` field. The use of multiple names in this field *must not* be used for bundling distributions together. It is intended for use when projects are forked and merged over time, while providing essentially the same function. Multiple names reflect the evolution of the project over time and not the bringing together of different packages, already distributed elsewhere, in a bundle. Thus, the 'distribute' distribution, a fork of setuptools, could say that it ``Provides-Dist`` a particular version of setuptools, to prevent setuptools from being downloaded and installed when distribute is already installed. If, over time, distribute evolved into a new package called 'distribute2' (for argument's sake), then that could say that it ``Provides-Dist`` a specific version of distribute and a specific version of setuptools. ----------------------------------- Some comments on the above: I'm not entirely comfortable with a Provides-Dist entry which does not specify a version, since it does not allow to you to test that a requirement is actually met. So, I've removed the "optional" qualification from the version. Also: what happens when a requirement is for setuptools (>= X.Y), but the distribute fork hasn't kept pace, and so only supports setuptools at a lower version than X.Y? I take it we're entirely comfortable with installing setuptools X.Y in that case? How would you ensure the right setuptools is always loaded, since presumably both are on sys.path? Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 11:49 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Egg-based tools don't have any problem with this, since they set sys.path to include the eggs needed for the running program. Other tools will have to tell the user and let them work it out, e.g. by using a different virtualenv. I personally don't think that forks claiming to "provide" something is really a good thing to encourage; ISTM that saying a package *conflicts* with another is more accurate, e.g. distribute Conflicts-Dist setuptools. I also think distributions should say they are obsoleted, rather than allowing other distributions to obsolete them. That is, centralized packaging systems rely on a central authority to resolve issues of who provides what and obsoletes what; there's an implicit "x obsoletes y [by decree of semi-independent third-party z]". However, in Python package metadata, it's "x obsoletes y [by decree of x]". IMO, this should be reversed to, "Y is obsoleted by x [by decree of y]", and "installing Y will conflict with X [by decree of X]", so that in each case the scope of authority for the statement is clear. That is, in each case (conflict or obsolescence), the project's developers are declaring under what conditions they will not be supporting an installation. In the case of obsolescence, the developer is saying, "this is being phased out, you should use that other thing instead". In the case of forks, the developer is saying, "If you install both versions, something's gonna break." Note that installation conflict is a more conservative claim anyway: a conflict between forked "foobar" packages is permanent, in the sense that it doesn't matter what versions of both packages you're interested in: they both want to install a foobar/__init__.py. (Of course, installers can and should detect that condition automatically, but not until they download the package first.)

On Tue, Nov 20, 2012 at 4:07 PM, Glenn Linderman <v+python@g.nevcal.com>wrote:
(We've been over this before, the last time this discussion came up on the Distutils-SIG for a previous Metadata PEP a year or two back, but here goes....) Obsoleting a package is for handling renames and support transitions. For example, if it actually did anything to do so, I'd mark RuleDispatch as obsoleted-by PEAK, the Pylons folks might mark some version of that as obsoleted-by Pyramid, etc. To put it another way, marking a package obsolete is part of deprecation and replacement, not an unsubstantiated third-party claim about the maintenance status of an unrelated project. If a package is *actually* dead, there's no real point to declaring that something else obsoletes it, and certainly no reason to put it in metadata form. Otherwise, we could have Twisted claiming to obsolete GEvent and vice-versa at the same time. Which one should an installer believe? It makes no sense in a standard where the project's maintainers can say whatever they want about somebody else's project. The scope of authority for automatically-consumed metadata should *only* encompass the project that provided the metadata.

I think the Metadata 1.1 treatment of these concepts is in some ways better. (Metadata 1.2 added the -Dist suffix to the fields in an attempt to make it clear that dependency names are PyPI names and not "import x" names.) http://www.python.org/dev/peps/pep-0314/ says: Provides (multiple use) Each entry contains a string describing a package or module that will be provided by this package once it is installed. These strings should match the ones used in Requirements fields. A version declaration may be supplied (without a comparison operator); the package's version number will be implied if none is specified. Example: Provides: xml Provides: xml.utils Provides: xml.utils.iso8601 Provides: xml.dom Provides: xmltools (1.3) Obsoletes (multiple use) Each entry contains a string describing a package or module that this package renders obsolete, meaning that the two packages should not be installed at the same time. Version declarations can be supplied. The most common use of this field will be in case a package name changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. When you install Torqued Python, the Gorgon package should be removed. Example: Obsoletes: Gorgon They mean pretty much what the same words mean in RPM and do not need further bikeshedding.

Daniel Holth <dholth <at> gmail.com> writes:
They mean pretty much what the same words mean in RPM and do not need further bikeshedding.
But isn't it the case that the scenarios are different because in the case of RPMs, we have a presumed authority which can determine e.g. what obsoletes what, whereas with Python distributions, there's no central authority that has this function? Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 06:43:32PM -0500, Daniel Holth wrote:
Agreed. And this is closer to the way that distributions' tools have to operate than they'd want to :-( Within the distribution we like to pretend that we only need to care about the packages that we generate. But we also know that whether or not we support it, ordinary users will install pacakges from outside of our walls. That means that the packaging tools that we create will need to deal with things that we might not condone within our "presumed authority". We trust that people are going to do more or less the right thing with the tools we offer. Once in a while they don't but by and large they do. -Toshio

On Tue, Nov 20, 2012 at 6:43 PM, Daniel Holth <dholth@gmail.com> wrote:
That's sort of beside the point. The *only* use case which Obsoletes provides over Obsoleted-By is that it allows third parties to unilaterally advertise their forked project as a substitute for the original, and maybe block users from switching back to the un-forked project -- regardless of the status of the original project or the consent of the original project's maintainer. This use case, however, benefits nobody besides the forkers. There are many other legitimate channels by which the forkers can advertise themselves as a replacement for their parent project, and no reason for the installing end user to be bothered with the subject, except in case of a conflict. For somebody obsoleting their own package, on the other hand, it's likely well worth the effort to at least update their PyPI metadata to reflect the change in status -- especially if this can be done through the web interface. It's likely they would wish to update their description as well, to notify human beings of the change. But here's the thing that kills "Obsoletes" dead in the first place as a practical tool: unless installers use a PyPI search before installing *every single project*, there is no way for them to realize that the obsoleting package exists! By contrast, if a package is "Obsoleted-By", then installing that package (or declaring a dependency on it) provides an opportunity to inform the user of the need to make a transition. This can't be done with an "Obsoletes" field. Conversely, if you have already installed a package that says it "Obsoletes" another package, this does *not* tell you that the obsolete package shouldn't still be installed! A replacement project doesn't necessarily share the same API, and may exist in a different package namespace altogether. In short, "Obsoletes" is virtually *useless* as a machine-consumed metadata field, because there is nothing you can actually do with it in a practical installer. I'm against adding more fields to the metadata which do not have a specification for how they should be used in practice; the presence of such fields has been a problem with most of the preceding metadata specs, IMO.
I re-named a package once just because I did not like the name. I used "Obsoletes" for that. It is documentation.
Note that "Obsoleted-By" would also serve that use case, and have the additional benefit of being able to notify people who install new copies of the replaced project. (By the way Daniel, I'm sorry I didn't comment on this PEP sooner; I'd forgotten about the previous PEP 345 rehashing in 2010, or rather, I just sort of assumed that the results of that discussion had been incorporated into the newer PEP, and didn't notice the reappearance of the noise fields until your call for approval just now. Sorry!)

On Wed, Nov 21, 2012 at 1:10 PM, PJ Eby <pje@telecommunity.com> wrote:
Then that's a bug in the metadata of the project misusing "Obsoletes", and should be reported as such. If the new package is not a drop-in replacement, then it has no business claiming to obsolete the other package. I think one of the big reasons this kind of use is rare in the Python community is that project name changes are almost always accompanied by *package* name changes, and as soon as you change the package name, you're changing the public API, and thus it is no longer appropriate to use Provides or Obsoletes, as the renamed project is no longer a drop-in replacement for the original. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Nov 21, 2012 at 1:20 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I realised that my comments above are more about the appropriate use of "Provides", rather than "Obsoletes". For a practically useful "Obsoletes", I think I'm inclined to agree with you, as "Obsoleted-By" provides a way for a maintainer to explicitly declare that a project is no longer receiving updates, and users should migrate to the replacement project if they want to continue to receive fixes and improvements. The current version of "Obsoletes" is, as Daniel describes, really only useful as documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Nov 20, 2012 at 11:01 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
A few more changes to try to address some of the confusion about Requires-Dist: without re-designing the entire requirements system. PEP-426 was written only to add extras support to the format. The other changes, re-writing much of the PEP, have been an unfortunate side-effect. The file format's keys are case-insensitive. The version number should be in PEP 386 form. There are too many non-PEP-386 versions now and in the future to make it a must. Distribution (requirement) names are noted as being distinct from ``import x`` module names. Parenthetical explanation has balanced parens. "bundled" has been struck from the PEP. diff -r 55c706023fa2 -r 026aebf2265d pep-0426.txt --- a/pep-0426.txt Sun Nov 18 19:55:10 2012 +0200 +++ b/pep-0426.txt Mon Dec 03 20:36:13 2012 -0500 @@ -34,9 +34,9 @@ The syntax defined in this PEP is for use with Python distribution metadata files. The file format is a simple UTF-8 encoded Key: value -format with no maximum line length, followed by a blank line and an -arbitrary payload. The keys are case-insensitive. It is parseable by -the ``email`` module with an appropriate ``email.policy.Policy()``. +format with case-insensitive keys and no maximum line length, followed by +a blank line and an arbitrary payload. It is parseable by the ``email`` +module with an appropriate ``email.policy.Policy()``. When ``metadata`` is a Unicode string, ```email.parser.Parser().parsestr(metadata)`` is a serviceable parser. @@ -94,7 +94,7 @@ ::::::: A string containing the distribution's version number. This -field must be in the format specified in PEP 386. +field should be in the format specified in PEP 386. Example:: @@ -283,12 +283,13 @@ Each entry contains a string naming some other distutils project required by this distribution. -The format of a requirement string is identical to that of a -distutils project name (e.g., as found in the ``Name:`` field. -optionally followed by a version declaration within parentheses. +The format of a requirement string is identical to that of a distribution +name (e.g., as found in the ``Name:`` field) optionally followed by a +version declaration within parentheses. -The distutils project names should correspond to names as found -on the `Python Package Index`_. +The distribution names should correspond to names as found on the `Python +Package Index`_; often the same as, but distinct from, the module names +as accessed with ``import x``. Version declarations must follow the rules described in `Version Specifiers`_ @@ -305,7 +306,8 @@ Like Requires-Dist, but names dependencies needed while the distributions's distutils / packaging `setup.py` / `setup.cfg` is run. -Commonly used to generate a manifest from version control. +Commonly used to bring in extra compiler support or a package needed +to generate a manifest from version control. Examples:: @@ -318,17 +320,19 @@ Provides-Dist (multiple use) :::::::::::::::::::::::::::: -Each entry contains a string naming a Distutils project which -is contained within this distribution. This field *must* include -the project identified in the ``Name`` field, followed by the -version : Name (Version). +Each entry contains a string naming a requirement that is satisfied by +installing this distribution. This field *must* include the project +identified in the ``Name`` field, optionally followed by the version: +Name (Version). A distribution may provide additional names, e.g. to indicate that -multiple projects have been bundled together. For instance, source -distributions of the ``ZODB`` project have historically included -the ``transaction`` project, which is now available as a separate -distribution. Installing such a source distribution satisfies -requirements for both ``ZODB`` and ``transaction``. +multiple projects have been merged into and replaced by a single +distribution or to indicate that this project is a substitute for another. +For instance distribute (a fork of setuptools) could ``Provides-Dist`` +setuptools to prevent the conflicting package from being downloaded and +installed when distribute is already installed. A distribution that has +been merged with another might ``Provides-Dist`` the obsolete name(s) +to satisfy any projects that require the obsolete distribution's name. A distribution may also provide a "virtual" project name, which does not correspond to any separately-distributed project: such a name @@ -359,10 +363,9 @@ Version declarations can be supplied. Version numbers must be in the format specified in `Version Specifiers`_. -The most common use of this field will be in case a project name -changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. -When you install Torqued Python, the Gorgon distribution should be -removed. +The most common use of this field will be in case a project name changes, +e.g. Gorgon 2.3 gets renamed to Torqued Python 1.0. When you install +Torqued Python, the Gorgon distribution should be removed. Examples::

On Tue, Nov 20, 2012 at 5:01 PM, Daniel Holth <dholth@gmail.com> wrote:
The problem is that the above *makes no sense*. "Torqued Python" and "Gorgon" are veiled pseudonyms for Twisted and Medusa.... and Twisted is not actually a plug-and-play substitute for Medusa, AFAIK. Can anybody suggest an *actual* use case for "Obsoletes", and explain how it is supposed to work in software? The last time this discussion came up, nobody had any use cases that stood up to the "how's that actually going to work and/or help?" test. Here's a post of mine summarizing this and related points in the previous thread: http://mail.python.org/pipermail/catalog-sig/2010-October/003364.html

On Tue, Nov 20, 2012 at 9:44 PM, PJ Eby <pje@telecommunity.com> wrote:
Again I didn't write any of this. Someone mentioned ZODB + transaction. The PEP should have used the word "merged" instead of "bundled". When two packages become one, and the redundant package is no longer being developed, Provides-Dist can be used. I re-named a package once just because I did not like the name. I used "Obsoletes" for that. It is documentation.

On Wed, Nov 21, 2012 at 12:44 PM, PJ Eby <pje@telecommunity.com> wrote:
Sure. This is an RPM example, but exactly the same thing applies at the Python level. One of the dependencies of PulpDist (a directory mirroring tool I wrote), is the Pulp project (originally just an RPM mirroring tool, but now with plugin-based support for mirroring other things). The upstream version of Pulp that I currently use is missing Kerberos login support, so I have patched that in via RPMs patching features. To avoid messing up others sharing the internal yum repo where this is published, I actually use the Provides/Conflict/Obsoletes features of RPM to make sure my patched and renamed copy and the upstream version don't interfere with each other (and certainly can't be installed on the same system, as they would trample all over each other by attempting to install the same files). Mostly though, these labelling tools are especially useful for internal forks and mergers - the ones you *don't* share with the wider internet, except perhaps in the form of upstream patches (For example: https://bugzilla.redhat.com/show_bug.cgi?id=831937). On a public index, drop-in replacements are *always* going to be controversial from a social point of view, which is why there are only two current examples on PyPI I am aware of (i.e. distribute vs setuptools and Pillow vs PIL). The first was essentially a hostile fork, while the latter started as an attempt to provide decent packaging support when the current maintainer didn't show any interest in doing so. In such cases, it is absolutely essential that the *forking* project is able to declare that it is a replacement for the original project. It is then up to the community to decide whether or not the claims of being a suitable replacement are valid, which will be shown most clearly in relative uptake numbers between the original project and the forked one. I do consider it unfortunate that Python has only copied 3 of the 4 RPM dependency management fields (i.e. only Provides, Requires, and Obsoletes, without copying the more value neutral Conflicts) and I also prefer the "capability" terminology in the Fedora RPM guide that makes it clear that these are really arbitrary strings from a tooling point of view that only match the package name by convention. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say:
(1) Then how *should* the "bundle-of-several-components" case be represented? (2) How is 'Provides-Dist' different from 'Obsoletes-Dist'? The only difference I can see is that it may be a bit more polite to people who do want to install multiple versions of a (possibly abstract) package. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

On Tue, Nov 20, 2012 at 3:58 PM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
The useful way to bundle a bunch of things would be to just include them all in an executable folder or zipfile with __main__.py. PEP 426 and the package database would not get involved. The bundle would be distributed as an application you can download and use, not as an sdist on PyPI. The intent of Provides and Obsoletes is different. Obsoletes would not satisfy a requirement during dependency resolution. The RPM guide explains a similar system: This brings the total to four types of dependencies that the RPM system tracks: - Requires, which tracks the capabilities a package requires - Provides, which tracks the capabilities a package provides for other packages - Conflicts, which describes the capabilities that if installed, conflict with capabilities in a package - Obsoletes, which describes the capabilities that this package will make obsolete Packages advertise this dependency information. Each dependency holds the type, such as requires, a capability, such as a shared library or a package name, and optionally a version number, such as requiring the python package at a version number greater than or equal to 2.2 (python >= 2.2). http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/RPM_...

On 11/20/12, Daniel Holth <dholth@gmail.com> wrote:
On Tue, Nov 20, 2012 at 3:58 PM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say:
(1) Then how *should* the "bundle-of-several-components" case be represented?
When I look at, for example, twisted, there are some fairly fine distinctions. I can imagine some people wanting to handle each little piece differently, since that is the level at which they would be replaced by a more efficient implementation. That doesn't mean that someone using the default should have to manage 47 separate little packages individually. Also note that ZODB is mentioned as a bundling example in the current (2012-11-14) PEP. What does the PEP recommend that they do? Stop including transaction? Keep including it but stop 'Provides-Dist'-ing it? The current PEP also specifies that "This field must include the project identified in the Name field, followed by the version : Name (Version)." but the examples do not always include version. Why is the MUST there? Is there some way to distinguish between concrete and abstract provisions? For example, if MyMail (2012.11.10) includes 'Provides-Dist: email', does that really get parsed as 'Provides-Dist: email (2012.11.10)'?
The intent of Provides and Obsoletes is different. Obsoletes would not satisfy a requirement during dependency resolution.
The RPM guide explains a similar system:
As best I can understand, Obsoletes means "Go ahead and uninstall that other package." Saying that *without* providing the same functionality seems like a sneaky spelling of "Please break whatever relies on that other package." I'm willing to believe that there is a more useful meaning. I'm also willing to believe that they are logically redundant but express different intentions. The current wording doesn't tell me which is true. (Admittedly, that is arguably an upstream bug with other package systems, but you should still either fix it or explicitly delegate the definitions.) And as long as I'm asking for clarification, can foopkg-3.4 obsolete foopgk3.2? If not, is it a semantics problem, or just not idiomatic? If so, does it have a precise meaning, such as "no longer interoperates with"? And now that I've looked more carefully ... Can a "Key: Value" pair be continued onto another line? The syntax description under "Metadata Files" does not say so, but later text suggests that either leading whitespace or a leading tab specifically (from the example code) will work. (And is description a special case?) Is the payload assumed to be utf8 text? Can it be itself a mime message? Are there any restrictions on 'Name'? e.g., Can the name include spaces? line breaks? Must it be a valid python identifier? A valid python qualname? 'Version' says that it must be in the format specified in PEP 386. Unfortunately, it doesn't say which part of 386. Do you mean that it must be acceptable to verlib.NormalizedVersion without first having to call suggest_normalized_version? 'Summary' specifies that it must be one line. Is there a character limit, or do you just mean "no line breaks"? Do you want to add a "Should be less than 80 characters" or some such, based on typical tool presentation? Would it be worth repeating the advice that longer descriptions should go in the payload, after all headers? (Otherwise, they have to find 'Description' *and* notice that it is deprecated and figure out what to do instead.) Under 'Description', it isn't entirely clear whether what terminates the field. "Multiple paragraphs" suggests that there can be multiple lines, but I'm guessing that -- in practice -- they have to be a single logical line, with all but the first starting with whitespace. Under 'Classifier', is PEP 301 really the current authority for classifiers? I would prefer at least a reference to http://pypi.python.org/pypi?%3Aaction=list_classifiers demonstrating which classifiers are currently meaningful. Under 'Requires-Dist', there is an unclosed parenthesis. Does the 'Setup-Requires-Dist' set implicitly include the 'Requires-Dist' set, or should a package be listed both ways if it is required at both setup and runtime? The Summary of Differences from PEP 345 mentions changes to Requires-Dist, but I don't know what they were -- even the unclosed parentheses seemed the same. The appendix gives code for generating and parsing continuation lines that suggests the continuation whitespace is exactly one tab -- is other whitespace OK too? -jJ

On Tue, Nov 20, 2012 at 7:18 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
ZODB is a bad example. The word "bundling" will be struck from the PEP entirely. Two sdists should not be combined into one sdist when both packages are still being developed. If A and B are merged into a single PyPI package C, and A and B will no longer be developed, then C may Provides-Dist A and B. http://www.python.org/dev/peps/pep-0426/#requires-dist-multiple-use No MUST on the Requires-Dist version. If no version is there, it should satisfy any version requirement. Is there some way to distinguish between concrete and abstract
No.
When I used Obsoletes, it meant "I am no longer developing this other package that is identical to this re-named package". The system of requirements/conflicts (as the RPM system) does not appear to be entirely orthogonal. And now that I've looked more carefully ...
Description (now in the payload, please) is the only field that is commonly multi-line. Any field could continue onto the next line as far as the parser is concerned. It probably would not make sense. Is the payload assumed to be utf8 text? Can it be itself a mime message?
The entire file needs to be utf-8. The payload is assumed to be utf-8 text in this version. Wouldn't a mime message also be utf-8 text? (we wouldn't know what to do with it)
setuptools constrains it to alphanumeric characters. Metadata 1.3 doesn't say. 'Version' says that it must be in the format specified in PEP 386.
It means it is expected to match: http://www.python.org/dev/peps/pep-0386/#the-new-versioning-algorithm expr = r"""^ (?P<version>\d+\.\d+) # minimum 'N.N' (?P<extraversion>(?:\.\d+)*) # any number of extra '.N' segments (?: (?P<prerel>[abc]|rc) # 'a' = alpha, 'b' = beta # 'c' or 'rc' = release candidate (?P<prerelversion>\d+(?:\.\d+)*) )? (?P<postdev>(\.post(?P<post>\d+))?(\.dev(?P<dev>\d+))?)? $""" Please do not enforce this regexp in your Metadata parser.
Non-blank lines that don't start with whitespace are keys. email.parser.Parser() takes care of this in an e-mail-inspired (but not any literal RFC) way. The distutils documentation has guidelines on the short description / summary.
The addition of the allowed "extra" variable in the ; condition is the most significant change.
Any whitespace would work.

Daniel Holth writes:
When I used Obsoletes, it meant "I am no longer developing this other package that is identical to this re-named package".
But as a user I could care less! The authors may care, but I don't care if Torqued "obsoletes" Gorgon, because in using Torqued I'm DTRT'ing even though I don't know it. What I care about is when I'm using Gorgon, and there's something "better" (or worse, "correct") to use in my application. It might be a good idea to have a just-like-Amazon While-This-Package-Is-Great-You-Might-Also-Consider: field. Tongue-in-cheeki-ly y'rs,

On Wed, Nov 21, 2012 at 12:00 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
Hence my suggestion for an Obsoleted-By field, in which Gorgon would be able to suggest alternatives.
Yeah, that's basically what Obsoleted-By is for.

PJ Eby writes:
On Wed, Nov 21, 2012 at 12:00 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
My bad, my precise intention was to follow up on your idea (which, credit where credit is due, I had *not* hit upon independently). I should have made that clear. (I really shouldn't be answering English email at a Japanese-speaking conference, my brain thinks it knows what it's doing but shirazuni 日 本文化が染み込む....)
Well, Obsoleted-By is pretty strong language for suggesting possible alternatives. But I suspect that few projects would really want to be suggesting competitors' products *or* their own oldie-but-still-goodie that they'd really like to obsolete ASAP (put an Obsoleted-By line in every Python 2 distribution, anyone? :-)

How to use Obsoletes: The author of B decides A is obsolete. A releases an empty version of itself that Requires: B B Obsoletes: A The package manager says "These packages are obsolete: A". Would you like to remove them? User says "OK". On Wed, Nov 21, 2012 at 2:54 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

On Mon, Dec 3, 2012 at 2:43 PM, Daniel Holth <dholth@gmail.com> wrote:
Um, no. Even if the the author of A and B are the same person, you can't remove A if there are other things on the user's system using it. The above scenario does not work *at all*, ever, except in the case where B is simply an updated version of A (i.e. identical API) -- in which case, why bother? To change the project name? (Then it should be "Formerly-named" or something like that, not "Obsoletes".) Please, *please* see the previous Catalog-SIG discussion I linked: this is only one of multiple metadata fields that were thoroughly debunked in that discussion as completely useless for automated dependency management.

On Wednesday, December 5, 2012 at 2:13 AM, PJ Eby wrote:
You can automatically uninstall A from B in an automatic dependency management system. I *think* RPM does this, at the very least I believe it refuses to install B if A is already there (and the reverse as well).* There's nothing preventing an installer from, during it's attempt to install B, see it Obsoletes A, looking at what depends on A and warning the user what is going to happen and prompt it. I think Obsoletes as is an alright bit of information. I think the biggest flaw with Obsoletes isn't in Obsoletes itself, but is in the lack of a Conflicts tag that has the same functionality (minimally refusal to install both, possibly uninstall the previous one with a prompt to the user). Obsoletes has the semantics of a logical successor (typically renames) while Conflicts should have the semantics of a competitor. distribute would conflict with setuptools, foo2 would Obsoletes foo. * I could be wrong about RPM's treatment of Obsoletes
I don't see this in this thread, could you link it again?

On Wed, Dec 05, 2012 at 02:46:11AM -0500, Donald Stufft wrote:
This is correct.
I believe it refuses to install B if A is already there (and the reverse as well).*
I'd have to test this but I believe you are correct about the first. Not sure about the reverse.
In rpm-land, if something depended on A and nothing besides the actual A package provided A, rpm will refuse to install B. But rpm is meant to be used unattended so different package managers could certainly choose to prompt. For package renames, package B would have both an Obsoletes: A <= $OLD_VERSION and a Provides: A = NEW_VERSION -Toshio

On Wed, Dec 5, 2012 at 2:46 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Unless the user wrote those things that depend on A, they aren't going to be in a position to do anything about it. (Contrast with a distro, where dependencies are indirect - the other package will depend on an abstraction provided by both A and B, rather than directly depending on A *or* B.) (Also note that all the user knows at this point is that the author of B *claims* to obsolete A, not that the authority managing the repository as a whole has decreed B to obsolete A.)
You can automatically uninstall A from B in an automatic dependency management system
My point is that this can only work if the "obsoleting" is effectively just a rename, in which case the field should be "renames", or better still, "renamed-to" on the originating package. As I've mentioned repeatedly, Obsoleted-By handles more use cases than Obsoletes, and has at least one practical automated use case (notifying a developer that their project is depending on something that's obsolete). Also, the example given as a use case in the PEP (Gorgon to Torqued) is not just wrong, it's *actively misleading*. Gorgon and Torqued are transparent renames of Medusa and Twisted, which do not share a common API and thus cannot be used as the subject of any automated processing (in the case of Obsoletes) without doing some kind of PyPI metadata search for every package installed every time a package is installed.
I think Obsoletes as is an alright bit of information.
1. It cannot be used to prevent the installation of an obsolete package without a PyPI metadata search, since you must examine every *other* package on PyPI to find out whether some package obsoletes the one you're trying to install. 2. Unlike RPM, where metadata is provided by a trusted third party, Obsoletes can be specified by any random forker (no pun intended), which makes this information a mere advertisement... and an advertisement to the wrong audience at that, because they must have *already* found B in order to discover that it replaces A! 3. Nobody has yet supplied a use case where Obsoletes would not be strictly improved upon by Obsoleted-By. (Note that "the author of package X no longer maintains it" does not equal "package Y is entitled to name itself the successor and enforce this upon all users" -- this can work in RPM only because it is a third party Z who declares Y the successor to X, and there is no such party Z in the Python world.)
I don't see this in this thread, could you link it again?
http://mail.python.org/pipermail/catalog-sig/2010-October/003368.html http://mail.python.org/pipermail/catalog-sig/2010-October/003364.html These posts also address why a "Conflicts" field is *also* unlikely to be particularly useful in practice, in part for reasons that relate to differences between RPM-land and Python-land. (For example, RPMs can conflict over things besides files, due to runtime and configuration issues that are out-of-scope for a Python installer tool.) While it's certainly desirable to not invent wheels, it's important to understand that the Python community does not work the same way as a Linux distribution. We are not a single organization shipping a fully-functional and configured machine, we are hundreds of individual authors shipping our own stuff. Conflict resolution and package replacement (and even deciding what it is that things "provide" or "require") are primarily *human* processes, not technical ones. Relationship and support "contracts", IOW, rather than software contracts. That's why, in the distro world, a package manager can use simple fields to carry out the will of the human organization that made those support and compatibility decisions. For Python, the situation is a bit more complicated, which is why clear thinking is needed. Simply copying fields blindly from other packaging systems just isn't going to cut it. Now, if the will of the community is to turn PyPI into a distro-style repository, that's fine... but even if you completely ignore the human issues, there are still technical ones. Generally, distro-style repositories work by downloading the full metadata set (or at least an index) to a user's machine. And that's the sort of architecture you'd need in order for these type of fields to be technically feasible (e.g., doing an index search for Obsoletes), without grinding the PyPI servers into dust.

On Wed, Dec 5, 2012 at 4:10 PM, PJ Eby <pje@telecommunity.com> wrote:
My desire is to invent the useful "wheel" binary package format in a reasonable and limited amount of time by making changes to Metadata 1.2 and implementing the new metadata format and wheel in distribute and pip. Help me out by allowing useless but un-changed fields to remain in this version of the PEP. I am done with the PEP and submit that it is not worse than its predecessor. I can participate in a discussion about any of the following: Summary of Differences From PEP 345<http://www.python.org/dev/peps/pep-0345> - Metadata-Version is now 1.3. - Values are now expected to be UTF-8. - A payload (containing the description) may appear after the headers. - Added extra to environment markers. - Most fields are now optional. - Changed fields: - Description - Project-URL - Requires-Dist - Added fields: - Extension - Provides-Extra - Setup-Requires-Dist

On Wed, Dec 5, 2012 at 5:30 PM, Daniel Holth <dholth@gmail.com> wrote:
You could just mark those fields as deprecated and that they should not be used to delete packages or block packages from installation. Justification: nobody has managed to make them work in an automated tool yet, and their use in same is controversial, so they are downgraded to human-informational only. Please, let's not have yet *another* metadata spec that advertises these attractive nuisance[1] fields. I do not want us to be having this same conversation AGAIN the next time any metadata changes are being considered. We've already had it too many times already. PEPs are supposed to summarize these discussions for that very reason. --- [1] For non-native speakers, an attractive nuisance is a dangerous thing that entices unsuspecting persons to play with it; http://en.wikipedia.org/wiki/Attractive_nuisance_doctrine has more details.

On Thu, Dec 6, 2012 at 8:30 AM, Daniel Holth <dholth@gmail.com> wrote:
Agreed. PJE's arguments sound reasonable (especially since Obsoletes doesn't get used much in RPM-land either - Provides & Conflicts are both far more common), but they're orthogonal to the current aims of the metadata 1.3 update. If another author wanted to create a subsequent 1.4 update that was focused on replacing Obsoletes with Obsoleted-By, that would be fine (alternatively, a patch to the current PEP draft may be acceptable, but accepting such a change would be up to Daniel as the PEP author).
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wednesday, December 5, 2012 at 4:10 PM, PJ Eby wrote:
Arguing over Obsoletes vs Renames is a massive bikeshedding argument.
So it's a bad example. Hardly an argument against it.
Will require support from PyPI but this ultimately isn't a big deal.
If you're installing B you've prescribed trust to that author. If you don't trust the author then why are you installing (and then executing) code they wrote.
Very convenient to declare that one of the major use cases for Obsoletes over Obsoleted-By is not valid because of your own personal opinions. Like I said above, if you're installing a package that someone has uploaded you've implicitly granted them trust. There is far worse things that a bad Python citizen can do during, and after and install that what is allowed by Obsoletes.
I don't think Conflicts is something that every single package is going to require. As you said the tools themselves are going to handle the obvious cases for the bulk of situations. Unless you think there are no cases where two packages can conflict in more than what files are going to be installed then there are cases where it would be helpful and merely having the ability to use it when it is the best tool for the job isn't going to cause any great issue.
End systems often times do not have a singular organization controlling every package in their system. The best example is Ubuntu and their PPA's.
This is insane. A fairly simple database query is going to "grind the PyPI servers into dust"? You're going to need to back up this FUD or please refrain from spouting it.

On Dec 05, 2012, at 06:07 PM, Donald Stufft wrote:
What you installed Z, but B got installed because it was a dependency three levels down?
Well, basically never installing anything from PyPI except into a virtualenv is probably a good recommendation (maybe even now).
End systems often times do not have a singular organization controlling every package in their system. The best example is Ubuntu and their PPA's.
Well, PPAs are awesome, but have known and well-publicized trust issues. I wouldn't enable a PPA into my running system without really knowing who the owner is and why I'm using their PPA. Or doing a lot of testing in a chroot first, and probably pinning the package set to just the one(s) from the PPA I care about. Cheers, -Barry

On Wednesday, December 5, 2012 at 6:18 PM, Barry Warsaw wrote:
Sure, you granted trust to Z, Z granted trust to Y, and Y granted trust to B. Like in SSL certificates there was a chain of trust. If you don't trust Z then don't install their package.
A virtualenv only protects you from well behaved packages. There is no way to prevent a package author from doing very nasty things to you if they wish. Providing more power in the metadata doesn't make this situation better or worse, it just makes more standard paths in the cases where you do need to do it.
Basically the same thing can be said about packages on PyPI. All the same trust issues exist there. Simply installing a Python package is already granting far more trust than Obsoletes requires since installing a package is executed someone else's python code on your system. Even if you remove setup.py you're still going to be executing their code on your system. If you do not trust the author of the packages you are installing, you do not install their packages.

I understand the PEP author's frustration with continued discussion, but I think this subthread on Obsoletes vs. Obsoleted-By is not mere bikeshedding on names. It matters *which package* presents the information. Donald Stufft writes:
The author may be a genius when it comes to writing code, and an idiot when it comes to distributing it. Distribution is much harder than it looks, as you know. Trusting the author's *content* and trusting the author's *metadata* are not equivalent! As far as I can see, the semantics of putting "Obsoletes: A" into B without changing A are the same as the semantics of putting "Provides: A" into B (without changing A).[1] Only if A includes "Obsoleted-By: B" can a user be confident that B is a true successor to A. Furthermore, as has been pointed out, the presence of "Obsoleted-By" in A has the huge advantage of informing users and developers of dependent packages alike that A is obsolete when they try to update A. If A is not changed, then an attempted update will tell them exactly that, and they may never find out about B. But if A is modified in this trivial way, the package system can automatically inform them. This is also trivial, requiring no database queries. "Simple is better than complex." Footnotes: [1] A trustworthy author of B wouldn't use "Provides" unless he thought B was indeed a drop-in, and presumbly superior, replacement for A. And that's all that "Obsoletes" can tell you!

Makes sense. How about calling it Replacement. 0 or 1? Replacement (optional) :::::::::::::::::::::: Indicates that this project is no longer being developed. The named project provides a drop-in replacement. A version declaration may be supplied and must follow the rules described in `Version Specifiers`_. The most common use of this field will be in case a project name changes. Examples:: Name: BadName Replacement: AcceptableName Replacement: AcceptableName (>=4.0.0)

On 12/5/2012 10:12 PM, Daniel Holth wrote:
I like it. 'Replacement' is broader in meaning, more neutral, and less awkward than 'Obsoleted-by'. And I agree that A users have much more need to know about B the vice-versa. It is much the same situation with Py 2 and Py 3 (although the latter is *not* a drop-in replacement). -- Terry Jan Reedy

On Thu, Dec 6, 2012 at 1:12 PM, Daniel Holth <dholth@gmail.com> wrote:
Makes sense. How about calling it Replacement. 0 or 1?
Hah, you'd think I'd have learned by now to finish reading a thread before replying. It will be nice to get this addressed along with the other changes :) (FWIW, Conflicts and Obsoletes are messy in RPM as well, and especially troublesome as soon as you start enabling multiple upstream repos from different providers. The metadata problem is handled by prebuilding indices when the repo changes, but that's still more work for the server, and more work for clients)
Replacement (optional) ::::::::::::::::::::::
I like verb forms like Obsoleted-By or Replaced-By, as the noun form is ambiguous about the direction of the change. Since the field being replaced is Obsoletes, Obsoleted-By makes sense.
Indicates that this project is no longer being developed. The named project provides a drop-in replacement.
Typically, the new version *won't* be a drop-in replacement (e.g. you'll likely at least have to import from a different top level package). Instead, the field would more often be used as an explicit indicator that the project is no longer receiving updates, as the *development team* has moved on, so users may want to consider either migrating, taking over development (if the former developers are amenable) or forking. If the replacing project *is* a drop-in replacement for the old project, then it should also advertise a Provides-Dist for the original project. Automated tools can then easily detect the two cases: A Obsoleted-By-Dist B and B Provides-Dist A = A is defunct, and B should be a drop-in replacement for A A Obsoleted-By-Dist B (without a Provides-Dist on B) = A is defunct, B is a replacement for A, but some porting will be needed Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Dec 6, 2012 at 2:54 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Although Replaced-By would be fine as well - it's certainly much easier to say than the mouthful that is Obsoleted-By. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
Arguing over Obsoletes vs Renames is a massive bikeshedding argument.
And is entirely beside the point. The substantive question is whether it's Obsoletes or Obsoleted-By - i.e., which side is it declared on.
So it's a bad example. Hardly an argument against it.
Nobody has actually proposed a better one, outside of package renaming -- and that example featured an author who could just as easily have used an obsoleted-by field.
Will require support from PyPI but this ultimately isn't a big deal.
...and every PyPI clone. And of course the performance issues.
Trusting their code is one thing; trusting whether they understood a PEP (and its interactions with various installation tools) well enough to not accidentally delete *somebody else's code* out of my system is another thing altogether. OTOH, trusting an author to tell me (in an automated fashion), "hey, you should switch to this other thing as soon as you can" is a FAR smaller amount of required trust. Arguing that because I have to trust one thing, means I must trust another, is a "Fallacy of Gray" argument.
I didn't say it was invalid, I said: """Note that "the author of package X no longer maintains it" does not equal "package Y is entitled to name itself the successor and enforce this upon all users""" These things are not equal. AFAIK, well-managed Linux distros do not allow random forkers to declare themselves the official successor to a defunct package, so any analogy between this use case in the Python world and the distro world is strained at *best*.
The rationale for that is laid out in the posts I linked.
then there are cases where it would be helpful
Please, present a *real-life instance* where it would have been helpful to you.
and merely having the ability to use it when it is the best tool for the job isn't going to cause any great issue.
One of the posts I linked presents an instance where it would have actually *harmed* things to specify it, and it's quite easy to see how the same problem would arise if used for non-file-related conflicts... And the problem present is *directly* tied to the lack of a third-party Z who decides whether X and Y, as configured for release Q of distro P, "conflict". This is not a problem that is solvable even in *principle* for an automated tool in the absence of party Z, which means that any such field's actual function is limited to a heads-up to a human user.
I take it you're not familiar with PyPI's history of performance and scaling problems over the last several years, then. The statically cached "/simple" index was developed precisely to stop *today's* class of installation tools from killing the servers... and then mirroring PyPI was still required to scale. Any proposal that calls for encouraging tools to query a metadata field *every time* a package is installed (or even just downloaded) almost certainly needs to be vetted with the PyPI admin team.

On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote:
How about pexpect and pextpect-u as a better example?
Note that although well-managed Linux distros attempt to control random forking internally, the distro package managers don't prevent people from installing from third parties. So Ubuntu PPAs, upstreams that provide their own rpms/debs, and major third party repos (for instance, rpmfusion as an add-on repo to Fedora) all have and sometimes (mis)use the ability to Obsolete packages in the base repository. So Donald isn't stretching the relationship quite as far as you make it out. The ecosystem of packages for a distro carries uncontrolled packages just as much as pypi.
And the same for Provides. (ie: latest foo is 0.6c; bar Provides: foo-0.6d. an automated tool that finds both foo and bar in its dep tree can choose to install bar and not foo.) The ability for this class of fields to cause harm is not, to me, a compelling argument not to include them. It could be an argument to explicitly tell implementers of install tools that they all have caveats when used with pypi and similar unpoliced community package repositories. The install tools can then choose how they wish to deal with those caveats. Some example strategies: choose to prompt the user as to which to install, choose to always treat the fields as human-informational only, mark some repositories as being trusted to contain packages where these fields are active and other repositories where the fields are ignored. -Toshio

On Thu, Dec 6, 2012 at 1:49 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
Perhaps you could explain? I'm not familiar with those projects.
But in each of these cases, the packages are being defined *with reference to* some underlying vision of what the distro (or even "a distro") is. An Ubuntu PPA, if I understand correctly, is still *building an Ubuntu system*. Python packaging as a whole lacks such frames of reference. A forked distro is still a distro, and it's a fork *of something*. Rpmfusion is defining an enhanced Fedora, not slinging random unrelated packages about. If there's a distro analogy to PyPI, it seems to me that something like RpmFind would be closer: it's just a free-for-all of packages, with the user needing to decide for themselves whether installing something from a foreign distro will or won't blow up their system. (E.g., because their native distro and the foreign one use a different "provides" taxonomy.) RpmFind itself can't solve anybody's issues with conflicts or obsoletes; all it can do is search the data that's there. But unlike PyPI, RpmFind can at least tell you which vision of "a distro" a particular package was intended for. ;-)
The ability for this class of fields to cause harm is not, to me, a compelling argument not to include them.
But it is absolutely not a compelling argument *to* include them, and the actual arguments for them are pretty thin on the ground. The real knockdown is that in the PyPI environment, there aren't any automated use cases that don't produce collateral damage (outside of advisories about Obsoleted-By projects).
AFAIK, there are only a handful of curated repositories: Scipy, Enthought, and ActiveState come to mind. These are essentially "python distros", and they might certainly have reason to build policy into their metadata. I expect, however, that they would not want the *package* authors declaring their own conflicts or obsolescence, so I'm not sure how the metadata spec will help them. Has anyone asked for their input or experience? It seems pointless to speculate on what they might or might not need for curated distribution. (I'm pretty sure Enthought has their own install tools, not sure about the other two.)
A peculiar phenomenon: every defense of these fields seems to refer almost exclusively to how the problems could be fixed or why the problems aren't that bad, rather than *how useful the fields would be* in real-world scenarios. In some cases, the argument for the fields' safety actually runs *counter* to their usefulness, e.g., the fields aren't that bad because we could make them have a limited function or no function at all. Isn't lack of usefulness generally considered an argument for *not* including a feature? ;-)

On Fri, Dec 07, 2012 at 01:18:40AM -0500, PJ Eby wrote:
pexepect was last released in 2008. Upstream went silent with unanswered bugs in its tracker and no mailing list. A fork of pexpect was created that addressed the issue of unicode type in python2, a python3 port, and has slowly evolvd since then. I see that the original upstream has made some commits to their source repository since the fork was created although there has still been no new release.
Uhm.... that's both true and false as any complex system is. rpm and deb are just packaging formats. So: *) Not all packages built build on top of that system. There are rpm packages provided by upstreams that users attempt (to greater and lesser degrees of success) to install on SuSE, RHEL, Fedora, Mandriva, etc. There are debs built for Ubuntu that people attempt to install onto Debian. *) PPAs and rpmfusion may both build on top of an existing system but they can change the underlying structure, replacing components that other pieces of the base system depend on. You talk about the setuptools and distribute problem on pypi.... there's absolutley nothing that prevents someone from building a PPA or a package in a third-party rpm repository that packages a setuptools that Obsoletes: distribute or a distribute package that Obsoletes: setuptools.
If you constantly forget why the fields are useful, then I suppose you'll always believe that :-) -Toshio

On Fri, Dec 7, 2012 at 12:01 PM, Toshio Kuratomi <a.badger@gmail.com> wrote:
And what problem are you saying which fields would have solved (or which benefits they would have provided), for whom? If the packages have files in conflict, they won't be both installed. If they don't have files in conflict, there's nothing important to be informed of. If one is installing pexpect-u, then one does not need to discover that it is a successor of pexpect. If one is installing pexpect, it might be useful to know that pexpect-u exists, but one can't simply discover that from an Obsoletes field on pexpect-u. However, even if one did discover it, this would merely constitute an *advertisement* of pexpect-u's existence, not a *requirement* that it be used in place. A tool cannot know, without other affirmative user action, that it is actually a good assumption to use the advertised replacement. In the distro world, a user has *already* taken this affirmative action by choosing which repository to source packages from, on an implicit contract that this source is up to the job of managing his needs across multiple packages. Or, if they choose to source an off-brand or upstream package, they are taking affirmative action to risk it. In the Python world, there is no notion of a "repository", aside from a handful of managed Python distros, which have their own, distinct packaging methods and distribution tools. So there is no affirmative contract of trust regarding *inter-project* relationships. It is precisely this lack that is why the metadata spec has gone mostly unused since its inception about a decade ago. Nobody really knows what to "provide" or "require", or in what context they would actually be "obsoleting" anything that isn't their own package, or a package they've forked. But if you live mainly in the distro world, this concept seems absurd, and the fields *obviously* useful. But that's because you're swimming in an ocean of context that doesn't exist on dry land. You're saying that *of course* swimming fins are useful... if you live in the ocean. And I, living on dry land, am saying that *sure* they are... but only in a swimming pool or a pond, and we don't have very many of those here in dry Python-land. And the people who run the swimming pools have thoughtfully already provided their own. Do we need to standardize swim fin sizes for people who mostly live on dry land? The flip side of this, btw, is that there's an implicit contract in the Python world that there is generally only "the" package - not "the package as patched and re-packaged by vendors X, Y, and Z". If I install python project foo, version 1.2, I expect it to be the *same* foo-1.2, with the *same metadata*, *no matter where I got it from*. And so, this assumption is our "air" to your "water". We know that pools and ponds (curated Python distros) are different, as an exception to this rule, just as you know that reefs and islands (uncurated repositories, search engines, and upstream-built packages) are different, as an exception to your assumption that "the package I get is intended to play well with everything else in my system." (This of course is why many distro managers are suspicious of language-specific or other sorts of vertical package management tools - they seem as pointless as wheels in the water, solving problems you don't have, and creating new problems for you at the same time. Unfortunately, people on land will keep inventing them, because they have a different set of problems to solve -- some of which are actually created by the ocean-oriented tools. For example, virtualenv and its predecessors were developed to solve the "problem" of a single integrated environment, even though that integrated environment is the "solution" from a distro perspective.)
Sure. But the reference points still exist, and there is a layer of indirection between "packager" and "developer", even in the case where the packager and developer are the same person or organization. In the Python case, there is usually no such indirection, outside of curated systems like SciPy et al. (Even there, most of what third-party packaging is about in the Python world is taking care of binary builds.) Again, it's islands in the ocean vs. pools on land.
At the *same time*? That is, are you saying that there are repositories that contain *self-contained* "Obsoletes"-cycles? (Presumably, there are no end-user sites containing such cycles, if the install tool responds by refusing to install one or by removing the other.)
If you constantly forget why the fields are useful, then I suppose you'll always believe that :-)
I've stated many times that they're useful... in the context of a larger system. Within the distro packaging ecosystem, a package "conflicts", "obsoletes", or "provides" things *relative* to some notion of an installation -- however vague -- that has been selected by an explicit user action (such as choice of basic distro, package manager, and repository). So, despite their framing as binary relationships -- e.g. Obsoletes(predecessor,succesor) -- the *actual* relationship is three-valued: Obsoletes(predecessor, successor, integration-context). The third player in the relationship is whoever *packaged* the project(s) in question... and in the Python world (outside of curated repositories), that packager is *always the original author*. Now, in the case where the packager and author are different, we can talk about such relationships in the same way: binary relationships with an implied third. For example, if SciPy decided at some point to replace NumPy with NumPyPy, it would be more than reasonable to state that Obsoletes(NumPy, NumPyPy, SciPy), even as at the same time, perhaps Enthought has already tried this and decided to go the other way, so that Obsoletes(NumPyPy, NumPy, EnthoughtPD). They use different tools and repositories and thus can imply the third position. In neither case, however is SciPy or Enthought (nor the authors of NumPy or NumPyPy), entitled to declare an Obsoletes relationship with a *true* wildcard for the third position. And so the key distinction between PyPI and the distro world is that *PyPI is not an integration context*. Packages provided by authors do not usually include this type of metadata, unless the author of the package has a specific integration context in mind. So the burden falls to either the repository manager or the user to define these higher-level relationships *within their intended integration context* (Or to put it another way, *somebody* has to be the "packager", not just the "developer".) Currently, Python distribution tools, culture, and methodology do not have any precedent for the metadata spec contents to be overrridden by a third-party packager, curator or repository manager, in the way that is normal and common in the distro world. (Try to imagine a Linux distro where this kind of information was *always* put in "upstream", because *there is no such thing* as "downstream". That's what it's like "on land".) This is why I keep saying that blind copying is an invitation to trouble, and that clear thinking about the actual requirements is needed. I would not object to explicitly three-way versions of these fields (requires, provides, conflicts, obsoletes) that define a specific integration context in which the statement applies. (Although defining how to name integration contexts would present a *new* challenge for discussion!) Likewise, I would not object to discussion of how to manage metadata for *repackaging* of Python projects by third-party curators (e.g. SciPy et al), and ways to keep that separate from the author's declarations. Or discussion of what should constitute a "repository" in the Python world, as opposed to what we have now (which apart from curated distributions, consists mainly of indexes, not true repositories in the distro sense). Today, however, there is no separation in the metadata spec (or tools) between "packaging" (in the sense understood by distros) and "distributing" (in the sense normally applied to Python packages distributed via PyPI and similar channels). And "packaging" in the distro sense is all about *integrating* packages, not merely making them *available* for others to integrate. That's the critical difference between the two, and in the resulting use cases for the metadata spec.

On Sat, Dec 8, 2012 at 8:02 AM, PJ Eby <pje@telecommunity.com> wrote:
To strain the analogy, the main value of these fields exists on the beach: at the point where you need to impedance match between the Python community and another packaging community. The ideal is to be able to get a point where you can point an automated tool at a project on PyPI and say "give me that, only packaged as RPM/deb/whatever, with appropriate distro specific metadata". Such a tool is obviously going to be distro specific, since it is going to have to do some remapping based on file names to pick up existing distro packages, but it *also* needs upstream metadata. Even in a distro, a "Conflicts:" field often *does* denote runtime conflicts (e.g. over a particular network port), because, as you say, filesystem level conflicts will usually be picked up automatically. The distro philosophy is to say "OK, we simply won't let you install conflicting projects at the same time, so you won't be surprised later by a conflict that only shows up if you happen to run them both at the same time". It's designed to turn a complex, hard to debug, problem into a simple, explicit error at installation time. People build complex systems (especially web apps) based on the PyPI ecosystem, and the upstream communities *can* assist in flagging potential issues in advance. If people start putting bad metadata in their projects, then that's just a bug to be dealt with like any other. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 07, 2012 at 05:02:26PM -0500, PJ Eby wrote:
In the specific case of pexpect and pexpect-u, the files don't actually conflict. The pexpect package includes a "pexpect.py" file, while pexpect-u includes a "pexpect/" directory. These conflict, but not in the easily detectable sense. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On Sun, Dec 9, 2012 at 10:38 PM, Andrew McNabb <amcnabb@mcnabbs.org> wrote:
Excellent! A concrete non-file use case. Setuptools handles this particular scenario by including a list of top-level module or package names, but newer tools ought to look out for this scenario, too.

Donald Stufft <donald.stufft <at> gmail.com> writes:
Never mind the "Obsoletes" information - even the more useful "Requires-Dist" information is not exposed via PyPI, even though it appears to be stored in the database. (Or if it is, please point me to where - I must have missed it.) Even if this were to be made available, it's presumably obtained from PKG-INFO. As I understand, this data is not considered reliable - for example, pip runs egg_info on downloaded packages to get updated information when determining dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be relied on, surely less critical information such as Obsoletes can't be relied on, either? Regards, Vinay Sajip

On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote:
Requires-Dist doesn't exist for more than a handful of packages. But PyPI exposes it via the XMLRPC API, possibly the JSON api as well.
pip runs egg_info because setuptools does not write out to PKG-INFO what the dependencies are (it does write it out to a different text file though). But IIRC that text file is not guaranteed to exist in the distribution. There's also the history where pip was trying to preserve as much backwards compat with easy_install as it could, and if you used the file that egg_info writes out then you'll only get the requirements for the system that the distribution was packaged on. Any if statements that affect the dependencies won't be in effect.

On Thu, Dec 6, 2012 at 6:33 AM, Donald Stufft <donald.stufft@gmail.com>wrote:
It will be Obsoleted-By:. The "drop in replacement" requirement will be removed. The package manager will say "you are using these obsolete packages; check out these non-obsolete ones" but will not automatically pull the replacement without a Requires tag. I will probably add the unambiguous Conflicts: tag "uninstall this other package if I am installed". Many packages (IIRC more than half) have the pre-Metadata-1.2 equivalent of Requires-Dist: which is the very easy to parse requires.txt. This information is not reliable because it could depend on conditions in setup.py. Someone should write a setup.py compiler that determines whether a package's requirements are conditional or not. Environment markers (limited Python expressions at the end of Requires-Dist lines) attempt to make Requires-Dist reliable. You can execute them safely in your environment to determine whether a requirement is right for you: Requires-Dist: pywin32 (>1.0); sys.platform == 'win32' The wheel implementation makes sure all the metadata (the .dist-info directory) is at the end of the .zip archive. It's possible to read the metadata with a single HTTP partial request for the end of the archive without downloading the entire archive.

Daniel Holth <dholth <at> gmail.com> writes:
Sounds good, but can you point to any example code which does this? As I understand it, for .zip files you have to read the last part of the file to get a pointer to the directory, then read that to find where each file in the archive is, then seek to a specific position to read the file contents. Regards, Vinay Sajip

On 6 Dec, 2012, at 15:58, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Because zipfiles can be appended to other files (for example when creating a self-extracting archive) the zipfile module maintains the file offset of the start of a zipfile. The code in the stdlib doesn't appear to test that the zipfile is at a positive offset in the file, therefore with some luck the following will work: * Download the last 10K of the archive (adjust the size to taste, it should be large enough to contain the zipfile directory and the file you are trying to read) * Create a zipfile.ZipFile * Read the zipfile member. If that doesn't work you'll have to create a temporary file of the right size and place the downloaded bit at the end of that file. BTW. Another (more hacky) alternative is to place the interesting bits of dist-info at the start of the zipfile, then you only need to download the first bit of the archive and can then extract the bits you need by parsing the local file headers (zipfiles contain both a directory at the end of the zipfile and a local header stored just before the file data). Ronald

On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
You have to make a maximum of 3 requests: one for the directory pointer, one for the directory, and one for the file you want. It's not particularly difficult to make an HTTP-backed seekable file object to pass to ZipFile() for this purpose but I don't have an example. Normally the last few k of the file will contain all 3 pieces. 8k or 16k would be a good guess.

Daniel Holth <dholth <at> gmail.com> writes:
I don't need an example for doing it with multiple HTTP requests. I only asked for an example because you said one could read the metadata "with a single HTTP partial request", and I couldn't see how it could always be done with a single request. PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps it shouldn't be. IIUC, the directory of the zip file *could* be further from the end of the file by more than 16K, due to the possible presence of a pathologically large comment in the end record. Regards, Vinay Sajip

On Thu, Dec 6, 2012 at 11:30 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
It's just a "usually works" optimization that might be fun when bandwidth is more important than round trip times. The distance between the directory and the end of the file depends on the size of the directory. Django's is an extreme case at nearly half a meg; most are much smaller. On many filesystems it is cheap to create a sparse file the size of the entire archive and write the partial requests into it. The OS doesn't actually store all the 0's. The other reason wheel puts the metadata at the end is so the metadata can be re-written efficiently without re-writing the entire zipfile. The wheel project implements ZipFile.pop() which truncates the last file from a (normal) zip archive. This is especially useful when the last file is the attached digital signature.

On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
ISTR that this is especially true for zipimport: I think it depends on a zipfile signature being present at the *end* of the file. Certainly, the standard for .exe and shell wrappers for zipfiles is to place them at the beginning of the file, rather than the end.

On Thu, Dec 6, 2012 at 8:39 AM, Daniel Holth <dholth@gmail.com> wrote:
Sounds fine to me.
I will probably add the unambiguous Conflicts: tag "uninstall this other package if I am installed".
Please don't. See my lengthy posts from the previous PEP 345 retread discussion for why, or ask MRAB to succinctly summarize them as he did so brilliantly with the obsoletes/obsoleted-by issue. ;-) I'll take a stab at a short version, though: a conflict (other than filename conflict) is not an installation-time property of a single project, but rather a *runtime* property of an overall system to which the projects are being installed, including configuration that is out of scope for a Python-specific installation tool to manage. In addition, even declaring overall conflicts as a *mere shorthand* for an existing file conflict creates the possibility of stale conflict information! For example, RuleDispatch vs. PyDispatcher: at one time both provided a "dispatch" package, but if RuleDispatch declared PyDispatcher conflicting, the declaration would quickly have become outdated: PyDispatcher soon renamed its provided package to resolve the conflict. A file-based system can both detect and resolve this conflict (or lack thereof) automatically, whereas a manual "Conflicts" notation must be maintained by the author(s) of one or both packages and removed when out of date. In effect, a "conflicts" field actually *creates* conflicts and maintenance burdens where they did not previously exist, because even after the conflict no longer really existed, an automated tool would have prevented PyDispatch from being installed, or, per your suggestion above, unnecessarily *uninstalled* it after a user installed RuleDispatch. And unlike the Obsoletes->Obsoleted-By change, I do not know of any similar way to salvage the idea of a Conflicts field, without reference to some mediating authority that manages the information on behalf of an overall system into which the projects are being fitted. But in that case, neither of the projects really owns the declaration - it's more like Zope (say) would need a list of plugins that conflict with each other, or they could declare that they conflict when activated in the same instance. A generic Python installer, however, that doesn't know about Zope instances or Apache vhosts or Django apps or any other "environment of conflict", can't assume that *mere installation* constitutes a conflict! It doesn't know, for example, whether code from two simultaneously-installed packages will ever even be *imported* in the same process, let alone whether their specific conflicting features will be used in that process. This effectively ensures that in general, Python installation tools can *only* rely on file-based conflicts as being denotable by project metadata -- and even then, it's better to stick with *actual* file conflicts rather than predicted ones, to avoid the type of logjam described above. P.S. Sorry once again to drag you through all this at the last minute; I just sort of assumed you picked up where Alexis left off on the previous attempt at an update to PEP 345 and didn't pay close enough attention to earlier drafts.

On Fri, Dec 7, 2012 at 3:47 PM, PJ Eby <pje@telecommunity.com> wrote:
That's not what a Conflicts field is for. It's to allow a project to say *they don't support* installing in parallel with another package. It doesn't matter why it's unsupported, it's making a conflict perceived by the project explicit in their metadata. Such a field is designed to convey information to users about *supported* configurations, regardless of whether or not they happen to work for a given use case. If a user believes a declared conflict is in error, and having the two installed in parallel is important to them, they can: 1. Use virtual environments to keep the two projects isolated from each other 2. Use an installer that ignores Conflicts information (which will be all of them, since that's the status quo) 3. Make their case to the upstream project that the conflict has been resolved, and installing the two in parallel no longer causes issues Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 7, 2012 at 8:33 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That's not what a Conflicts field is for. It's to allow a project to say *they don't support* installing in parallel with another package.
If that's the actual intended use case, the PEP needs some revision. In particular, if there's a behavioral recommendation for installer tools, it should be to avoid installing the project that *declares* the conflict, rather than the one that is the object of that declaration. ;-) In any case, as I said before, I don't have an issue with the fields all being declared as being for informational purposes only. My issue is only with recommendations for automated tool behavior that permit one project's author to exercise authority over another project's installation. If the fields are defined in such a way that an author can only shoot *themselves* in the foot with a bad declaration, that's fine by me. So if package A includes a "Conflicts: B" declaration, I recommend the following: * An attempt to install A with B already present refuses to install A without a warning and confirmation * An attempt to install B informs the user of the conflict, and optionally offers to uninstall A In this way, any collateral damage to B is avoided, while still making the intended "lack of support" declaration clear. How does that sound?

On Sat, Dec 8, 2012 at 4:46 PM, PJ Eby <pje@telecommunity.com> wrote:
No, that's not the way it works. A conflict is always symmetric, no matter who declares it. The beneficiary of these notifications is the aggregator attempting to build a systematically coherent system, rather than one with latent incompatibilities waiting to bite them at run time. It doesn't *matter* if "A conflicts with B" or "B conflicts with A", you cannot have a system with both of them installed that will be supported by the developers of both A *and* B. Now, this beneficiary *may* be the packagers for a Linux distribution, but it may also be a larger Python distribution (ActiveState, EPD, etc), a web application developer, a desktop application developer, a system integrator for a large-scale distributed system, or anyone else that combines and deploys an integrated set of packages (even those a developer installs on their personal workstation). It's up to the user to decide who they want to believe. Now, it may be that, for a given use case, the end user doesn't actually care about the potential conflict (e.g. they've done their own research and determined that the conflicting behaviour doesn't affect their system) - that's then a design decision in the installation tools as to whether or not they want to make it easy for users to override the metadata. In the Linux distro case, the installer *and* most of the metadata are largely provided by the same people, so yum/rpm/etc generally *don't* make it easy to install conflicting packages. Python installers are in a different situation though, so forced installs are likely to be an expected feature (in fact, I expect the more likely outcome given the status quo is that the default behaviour will be a warning at installation time with an option to request enforcement of "no conflicts"). Building integrated systems *is hard*. Pretending projects can't conflict just because they're both written in Python isn't sensible, and neither is it sensible to avoid warning users about the the potential for latent defects when particular packages are used in combination. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Dec 8, 2012 at 5:06 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
But that *precisely contradicts* what you said in your previous email:
It's to allow a project to say *they don't support* installing in parallel with another package.
Just because A doesn't support being installed next to B, doesn't mean B doesn't support being installed next to A. B might work just fine with A installed, and even be explicitly supported by the author of B. Why should the author of A get to decide what happens to B? Just because I trust A about A, doesn't mean I should have to trust them about B. Look, I really don't care about the individual fields' definitions that much. I care about only one thing: A shouldn't get to (de facto) dictate what happens to B. If you *really* want the behavior to be symmetrical, then it should *only* be symmetrical if both A and B *agree* they are in conflict. (i.e., both refer to the other in their conflict fields). Otherwise, it should only be a warning. There are tons of other things that I could argue here about the positions you've laid out. But all I *really* care about is that we not define fields in such a way as to permit or encourage inter-package warfare -- intentional or not. Solutions acceptable to me include (in no particular order): * Make declarations affect only the declarer (as with Obsoleted-By) * Make declarations only warn users, not block installation or result in uninstallation * Have no automated action at all, and document them as intended for downstream repackagers only * Toss the field entirely * Make the field include a context (e.g. a distro name), so that only tools explicitly told you're operating in that context pay attention * Use the new metadata extension vocabularies to define hints for specific downstream packaging tools and systems * Replace "conflicts" with a specification of resources actually used by the project, so that such conflicts can be automatically detected without needing to target a specific project And there are probably others I haven't thought of yet. If you can be clearer about what it is you want from the Conflicts field *other* than just wanting it to stay as is (or perhaps *why* you would like to have the Python infrastructure side with project A over project B, irrespective of which project is A and which one is B), then perhaps I can come up with others.

On 2012-12-08 20:18, PJ Eby wrote:
[snip] If package A says that it conflicts with package B, it may or may not be symmetrical, because it's possible that package B has been updated since the author of package A discovered the conflict, so it's important that the user is told which package is complaining about the conflict, the one that is being installed or the one that is already installed. It may also be helpful if the package that includes the "Conflicts" declaration specifies which version of the other package it was last tested against in case there is a more recent version of the other package that does not cause the conflict, or, indeed, that there's a more recent version of the package that includes the "Conflicts" declaration that does not cause the conflict.

On 09/12/12 08:14, MRAB wrote:
I must admit than in reading this thread, I'm having a bit of trouble understanding why merely *installing* packages should lead to conflicts. Assuming that two software packages Spam and Ham install into directories Spam and Ham, how can merely having them installed side-by-side lead to a conflict? I can see how running or importing Spam and Ham together might lead to problems. And I can see that if package Spam wants to install into directory Ham, that would be bad. But who does that? Have I just demonstrated my naivety when it comes to packaging? Under what circumstances would two well-behaved packages with different names conflict? -- Steven

On Sun, Dec 9, 2012 at 12:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:
If two packages Spam and Ham both define a module Jam, then the one that gets loaded will depend on the search path. That would be one form of conflict. ChrisA

On 09/12/12 12:32, Chris Angelico wrote:
import Spam.Jam import Ham.Jam What am I missing? Why would a software package called "Spam" install a top-level module called "Jam" rather than "Spam"? Isn't the whole point of Python packages to solve this namespace problem? -- Steven

On Saturday, December 8, 2012 at 9:11 PM, Steven D'Aprano wrote:
Conflicts doesn't really solve file based conflicts as PJ Elby has pointed out tools need to detect that circumstance already. But to answer this question no, there is no required mapping between Project names (what your thing is called on PyPI) and python package names (what you import). Something named Spam on PyPI could provide multiple python packages, named whatever it was they wanted to be named.

On Sun, Dec 9, 2012 at 1:11 PM, Steven D'Aprano <steve@pearwood.info> wrote:
That would require/demand that the software package MUST define a module with its own name, and MUST NOT define any other top-level modules, and also that package names MUST be unique. (RFC 2119 keywords.) That would work, as long as those restrictions are acceptable. ChrisA

On Sun, Dec 09, 2012 at 01:51:09PM +1100, Chris Angelico wrote:
/me notes that setuptools itself is an example of a package that violates this rule )setuptools and pkg_resources). No objections to "That would work, as long as those restrictions are acceptable."... that seems to sum up where we're at. -Toshio

On 2012-12-09 01:15, Steven D'Aprano wrote:
[snip] Personally speaking, I was thinking more about possible problems at runtime due to functional conflicts, but it could apply to any (undefined) conflict.

On Sat, Dec 8, 2012 at 10:22 PM, MRAB <python@mrabarnett.plus.com> wrote:
If it's for a runtime functional conflict, there's no need for installation tools to worry about it, except perhaps in the case where a single project C depends on *both* A and B, where A and B conflict with each other. Apart from that piece of information, there is no way to know that the code will ever even be imported at the same time. (And even then, it's just a hint of the possibility, not a guarantee.) Nick, OTOH, says that the purpose of the field is to declare that mere side-by-side installation invalidates developer support for the configuration. However, the widespread confusion (conflicts?) over what exactly the field is supposed to mean and when it should be used suggests that its charter is not nearly as clear as it should be. It seems perhaps it is suffering from the so-called "Illusion of Transparency", wherein everybody looks at it and thinks that it *obviously* means X, and only a fool could think otherwise... except that everyone has a *different* value of X in mind. That's why I keep asking for specific, concrete use cases. At this point, for the field to make any sense, there needs to be some better idea of what a "runtime" or "undefined" conflict is. Apart from file conflicts, has anybody identified a single PyPI package that would make use of this field? If so, what *is* that example, and what is the nature of the conflict? Do any of the distro folks know of a Python project tagged as conflicting with another for their distro, where the conflict does *not* involve any files in conflict? (And the conflict is not specific to the distro's packaging of that project and the project in conflict? i.e., that it would have actually been possible and/or meaningful for the upstream developer to have flagged the conflict in the project's metadata, given the proposed metadata standard?)

On Sun, Dec 9, 2012 at 3:48 PM, PJ Eby <pje@telecommunity.com> wrote:
The best current example I know of is whether or not a given package is gevent compatible. At the moment, you have to try it and see, or hope the project developers have a note somewhere saying whether or not it works. "Incompatible" might be a better field name than "Conflicts" for that use case, though. You've persuaded me that any installer based notification of runtime conflicts should at most be a warning (or even a separate query), since the user has so many options for dealing with it (including the typical case where the two components are simply never used in the same process). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 09, 2012 at 12:48:45AM -0500, PJ Eby wrote:
In Fedora we do work to avoid most types of Conflicts (backporting fixes, etc) but I can give some examples of where Conflivts could have been used in the past: In docutils prior to the latest release, certain portions of docutils was broken if pyxml was installed (since pyxml replaces certain stdlib xml.* functionaltiy). So older docutils versions could have had a Conflicts: PyXML. Nick has since provided a technique for docutils to use that loads from the stdlib first and only goes to PyXML if the functionality is not available there. Various libraries in web stacks have had bugs that prevent the propser functioning of the web framework at the top level. In case of major issues (security, unable to startup), these top level frameworks could use versioned Conflicts to prevent installation. For instance: TurboGears might have a Conflicts: CherryPy < 2.3.1 Note, though, that if parallel installable versions and selection of the proper versions from that work, then this type of Conflict wouldn't be necessary. Instead you'd have versioned Requires: instead. -Toshio

On Sun, Dec 9, 2012 at 6:18 AM, PJ Eby <pje@telecommunity.com> wrote:
If I'm installing both A *and* B, I want to know if *either* project doesn't support that configuration. The order in which they get installed should *not* have any impact on my finding out that I am using one of my dependencies in an unsupported way that may cause me unanticipated problems further down the line. The author of A *doesn't* get to decide what happens to B, *I* do. They're merely providing a heads up that they believe there are problems when using their project in conjunction with B. My options will be: - use them both anyway (e.g. perhaps after doing some research, I may find out the conflict relates solely to a feature of B that I'm not using, so I simply update my project documentation to say "do not use feature X from project B, as it conflicts with dependency A") - choose to continue using A, find another solution for B - choose to continue using B, find another solution for A As a concrete example, there are projects out there that are known not to work with gevent's socket monkeypatching, but people don't know that until they try it and it blows up in their face. I now agree that *enforcing* a conflicts field at install time in a Python installer doesn't make any sense, since the nature of Python means it will often be easy to sidestep any such issues once you're aware of their existence (e.g. by avoiding gevent's monkeypatching features and using threads to interact with the uncooperative synchronous library, or by splitting your application into multiple processes, some using gevent and others synchronous sockets). I also believe that *any* Conflicts declaration *should* be backed up with an explicit explanation and rationale for that conflict declaration in the project documentation. Making it impossible to document runtime conflicts in metadata doesn't make those conflicts go away - it just means they will continue to be documented in an ad hoc manner on project web sites (if they get documented at all), making the job of package curation unnecessarily more difficult (since there is no standard way to document runtime conflicts). Adding a metadata field doesn't make sure such known conflicts *will* be documented, but it least makes it possible. So, I still like the idea of including a Conflicts field, but think a few points should be made clear: - the Conflicts field would be for documenting other distributions which have known issues working together in the same process and thus constitute an unsupported configuration - this field would be aimed at package *users*, rather than at installation tools (although it would still be good if they installation tools supported scanning a set of packages for known conflicts) - any use of this field should be backed up with a more detailed explanation in the project documentation Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 9, 2012 at 12:54 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This is probably moot now, but I didn't propose that installation order matter -- in both scenarios I described, you end up with a warning and A not installed, regardless of whether A or B were installed first.
The author of A *doesn't* get to decide what happens to B, *I* do.
The reason I said, "(de facto)", is because the default behavior of whatever the next big installation tool is, would be what most users would've gotten by default.
Here's the question, though: who's going to maintain that list? I can see gevent wanting to have a compatibility chart page in their docs, but it seems unlikely they'd want to block installation of non-gevent-compatible projects or vice versa. Similarly, I can't see why any of those other projects would want to block installation of gevent, or vice versa. That being said, I don't object to having the ability for either of them to do so: the utility of the field is *much* enhanced once its connection to installation tools is gone, since a wider variety of issues can be described without inconveniencing users.
Beyond that, I think a reference URL should be included *in the field itself*, e.g. to a bug report, support ticket, or other page that documents the incompatibility and will be updated as the situation changes. The actual usefulness of the field to anyone "downstream" seems greatly reduced if they have to go hunting for the information explaining the compatibility issue(s). This is a good example of what I meant about clear thinking on concrete use cases, vs. simply copying fields from distro tools. In the distro world, these kinds of fields reflect the *results* of research and decision-making about compatibility. Whereas, in our "upstream" world, the purpose of the fields is to provide downstream repackagers and integrators with the source materials for such research.
My concrete recommendation based on your comments, then, is: * The field should be called Known-Incompatibilities (to better clarify its purpose and avoid confusion with similarly-named installation-oriented metadata in other tools) * The field should be of the form (though not necessarily syntax): ProjectName==incompatible_version; info=url That is, each entry lists a project name and a specific version that is known to be incompatible, along with a (required) information URL. The URL should be for: * a page that is updated with any change in the situation * that will remain available indefinitely, and * describes the specific reason that particular project is considered incompatible, along with any available workarounds For minor issues, a bug report or support ticket is acceptable; otherwise, a long-lived documentation link should be used. In-page anchor links are acceptable. A simple link to either project's home page or main documentation page is *not* acceptable: the link must to be a part of the documentation that directly addresses the nature of the incompatibility. I'm not too picky about the version specification approach, though; the simplest thing is to only allow a single version to be named, but it also seems it could be reasonable to list one or more version ranges that appy, as long as they are not open-ended going forward. That is, saying versions 1.1-2.3 are incompatible is ok, but not "1.1 on". (Because the author of A is not in a position to declare on B's behalf that the incompatibility will *never* be fixable.) (I might be overthinking the versions, bit, though, since this is really just about warnings.) I would recommend that tools automatically provide the warning in cases where a project C depends on versions of A and B that are declared incompatible. In this case, while one cannot *prove* the incompatibility to be an issue, it is still a potential issue. (This is more of a package build-time issue, though, as with Replaced-By.) Speaking of Replaced-By, it probably makes sense to require a URL in the field there as well, but that URL can be an unchanging page such as an archived post to a mailing list or blog, announcing the project's renaming or obsolescence, and providing migration help or links thereto. I think it also should be a multi-valued field, just like Known-Incompatibilities. Recently, I came across a Python project "lepl" (a parser combinator library) that just declared its end-of-life, and actually recommended multiple alternatives, each of which would be more appropriate for some uses of a parsing library. (That is, there was no single "does everything" replacement for lepl's full feature set.) Finally, the PEP should document that the audience for both the Replaced-By and Known-Incompatibilities fields is developers and system integrators (such as distro teams). So they are designed to be processed by tools that *build* packages, rather than tools that *install* them. So, if you build a project that depends on something that's replaced, or a pair of things known to be incompatible, that's when you get warnings and such. Tools to check such things on installed projects are also ok, though to avoid unnecessary warnings, it's probably best to only list incompatibilities for co-dependents (and orphaned replaced projects) by default. That is, a checker should probably ignore replacements when there's an installed project depending on the replaced version, and ignore incompatibilities that aren't part of the same requirements subtree (and thus unlikely to be used together). Of course, having options to be more verbose is not an issue, and this isn't really something to legislate anyway -- it's just that listing *every* replaced project or potentially-incompatible pairing in even a moderately-sized installation is likely to be far more noise than signal.

PJ Eby writes:
+1 to "describing". A metadata format should not specify tool behavior, and should use behavior-neutral nomenclature. Rather, use cases that seem probable or perhaps wrong-headed should inform the design. Nevertheless, actual decisions about behavior should be left to the tool authors.
I agree with the meaning of the above paragraph, but would like to dissociate myself from the comparison implied by the expression "clear thinking". AFAICS, it's different assumptions about use cases that drives the difference in prescriptions here.

On Sun, Dec 9, 2012 at 8:48 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
What comparison is that? By "clear", I mean "free of prior assumptions". The assumptions that made the discussion difficult weren't just about the use cases themselves, but about the environments, tools, organizations, concepts, etc. surrounding those use cases. Indeed, even the assumption of what should *qualify* as a "use case" was a stumbling block on occasion. ;-) And by "thinking", I mean, "considering alternatives and consequences", as distinct from debating the merits of a specific position. Put together, the phrase "clear thinking on concrete use cases" means (at least to me), "dropping all preconceptions of the existing design and starting over from square one, to ask how best the problem may be solved, using specific examples as a guide rather than using generalities." Generalities not rooted in concrete examples have a way of leading to non-terminating discussions. ;-) Starting over a discussion in this fashion isn't easy, but the results are usually worth it. I appreciate Nick and Daniel's patience in particular.

PJ Eby writes:
By "clear", I mean "free of prior assumptions".
Ah, well, I guess I've just run into a personal limitation. I can't imagine thinking that is "free of prior assumptions". Not my own<wink/>, and not by others, either. So, unfortunately, I was left with the conventional opposition in thinking: "clear" vs. "muddy". That impression was only strengthened by the phrase "vs. simply copying fields from distro tools."
Sure, but ISTM that's the opposite of what you've actually been doing, at least in terms of contributing to my understanding. One obstacle to discussion you have contributed to overcoming in my thinking is the big generality that the packager (ie, the person writing the metadata) is in a position to recommend "good behavior" to the installation tool, vs. being in a position to point out "relevant considerations" for users and tools installing the packager's product. Until that generality is formulated and expressed, it's very difficult to see why the examples and particular solutions to use cases that various proponents have described fail to address some real problems. It was difficult for me to see, at first, what distinction was actually being made. Specifically, I thought that the question about "Obsoletes" vs. "Obsoleted-By" was about which package should be considered authoritative about obsolescence. That is a reasonable distinction for that particular discussion, but there is a deeper, and general, principle behind that. Namely, "metadata is descriptive, not prescriptive." Of course once one understands that principle, the names of the fields don't matter so much, but it is helpful for "naive" users of the metadata if the field names strongly connote description of the package rather than behavior of the tool.

On Mon, Dec 10, 2012 at 3:27 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I suppose I should have said, "free of *known* prior assumptions", since the trick to suspending assumptions is to find the ones you *have*. The deeper assumptions, alas, can usually only be found by clashing opinions with others... then stepping back and going, "wait... what does he/she believe that's *different* from what I believe, that allows them to have that differing opinion?" And then that's how you find out what it is that *you're* assuming, that you didn't know you were assuming. ;-) (Not to mention what the other person is.)
Right, but I started from a concrete scenario I wanted to avoid, which led me to question the assumption that those fields were actually useful. As soon as I began questioning *that* assumption and asking for use cases (2 years ago, in the last PEP 345 revision discussion), it became apparent to me that there was something seriously wrong with the conflicts and obsoletes fields, as they had almost no real utility as they were defined and understood at that point.
Unfortunately, it's a chicken-and-egg problem: until you know what assumptions are being made, you can't formulate them. It's an iterative process of exposing assumptions, until you succeed in actually communicating. ;-) Heck, even something as simple as my assumptions about what "clear thinking" meant and what I was trying to say has taken some back and forth to clarify. ;-)
Actually, the principle I was clinging to for *both* fields was not giving project authors authority over other people's projects. It's fine for metadata to be prescriptive (e.g. requirements), it's just that it should be prescriptive *only* for that project in isolation. (In the broader sense, it also applies to the distro situation: the project author doesn't really have authority over the distro, either, so it can only be a suggestion there, as well.)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/2012 05:06 AM, Nick Coghlan wrote:
Building such systems is *too hard* to deletgate to the maintainers of every Python distribution registered on the Cheeseshop: there is too much policy involved for the ha'penn'orth of mechanism we are discussing here (decentralized inter-project metadata) to support. Such metadata *cannot* be useful in the general sense, but only in the context of a "curated" collection of packages, where the *curator* (not the upstream package authors) makes the choices. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlDDwioACgkQ+gerLs4ltQ4rOACghpN5x+k0w0Umn20AG1WOvYkq KQsAnibXQtbTnmbrPaMaVEfLH7W496lk =WAh9 -----END PGP SIGNATURE-----

On Sun, Dec 9, 2012 at 8:41 AM, Tres Seaver <tseaver@palladion.com> wrote:
The authors of major projects are often in a good position to know when they conflict with other high profile projects and thus can't be used reliably in the same system. Now, *most* of the time, if there's a genuine conflict between two Python packages, it's going to be at install time - two projects attempting to install the same file obviously can't coexist on a single system (distribute and setuptools, for example, conflict at this level - they both want to own the "setuptools" and "easy_install" names). However, Python has plenty of other global state too (the codec registry, the import system, monkeypatching), and there is potential for conflict over underlying OS level resources. So let's look at the case of collections of Python packages that *are* curated. Maybe I'm a Linux distro packager, looking to automate the conversion to distro packages. Maybe I'm a toolsmith for a large corporation trying to build a curated set of packages for internal use (clearly indicating to my internal users which ones don't play nicely with each other and thus shouldn't be used together in the same project). Regardless of the reason, I'm the curator for a collection of Python packages. How shall I express the conflicts I have identified? Shall I go invent my own metadata system? Shall I be forced to choose a particular platform-specific dependency management system? How shall upstream authors communicate to *me* the conflicts that they're already aware of? Or, hey, there's this nice shiny cross-platform dependency management system *right here*. Maybe they'll be nice enough to consider handling *my* use case as well, even if it's a use case *they* don't care about. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 7, 2012 at 10:46 PM, PJ Eby <pje@telecommunity.com> wrote:
Skipping over a lot of other replies between you and I because I think that we disagree on a lot but that's all moot if we agree here. I have no problems with Obsoletes, Conflicts, Requires, and Provides types of fields are marked informational. In fact, there are many cases where packages are overzealous in their use of Requires right now that cause distributions to patch the dependency information in the package metadata. -Toshio

On 10 December 2012 16:35, Toshio Kuratomi <a.badger@gmail.com> wrote:
Given the endless debate on these fields, and the fact that it pretty much all seems to be about what happens when tools enforce them, I'm +1 on this. Particularly as these fields were not the focus of this change to the spec in any case. Paul.

There you go. Obsoleted-By (optional) ::::::::::::::::::::::: Indicates that this project is no longer being developed. The named project provides a substitute or replacement. A version declaration may be supplied and must follow the rules described in `Version Specifiers`_. The most common use of this field will be in case a project name changes. Examples:: Name: BadName Obsoleted-By: AcceptableName Obsoleted-By: AcceptableName (>=4.0.0)

On Wed, Nov 21, 2012 at 2:04 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Yes, I thought Daniel's rewording looked pretty reasonable on that front. However, the details of how an installer uses this information is really up to the installer developers and what their users expect/demand. It certainly isn't *practical* to do a full dependency analysis when PyPI doesn't provide the same kind of precalculated metadata that a yum repo does, but that's not something that should be spelled out in the distribution metadata PEP, any more than it is spelled out in the RPM format spec. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Nov 19, 2012 at 07:49:41PM -0500, Donald Stufft wrote:
I'm not sure this assertion about OS package managers is correct. I've only just read: http://www.python.org/dev/peps/pep-0426/#provides-dist-multiple-use but the rough rpm analogue seems to be the Provides: tag. Provides is given a string which is parsed into a name or a name and version like this: Provides: python Provides: python = 3.1.0 rpm has no way at package build time to tell that a particular name given in a provides in one package is the actual name of another package. At installtime, rpm keeps package names and provides names separately but in dependency comparisons either one can be used to satisfy a requirement. What that means is that when asking about information on a package with name "python", you'll get information about the python package with that name and not about anything else that Provides: "python". But if you are installing something that has a requirement on "python" either the package with the name python or any package that Provides: python can satisfy the requirement. Package managers with builtin dep solvers can be built on top of rpm. The one that I am familiar with is yum. Since yum is downloading the packages that are being fed into rpm, yum could choose to prefer the package name instead of things in Provides when it downloads. It doesn't, though. Just like the underlying rpm, it treats package names and names specificed through Provides: as equivalent. -Toshio

On Monday, November 19, 2012 at 8:35 PM, Toshio Kuratomi wrote:
Are you saying the RPM documentation is wrong? http://www.rpm.org/max-rpm/s1-rpm-inside-tags.html The provides tag is used to specify a *virtual package* that the packaged software makes available when it is installed. Normally, this tag would be used when different packages provide equivalent services. For example, any package that allows a user to read mail might provide the mail-reader virtual package. Another package that depends on a mail reader of some sort, could require the mail-reader virtual package. It would then install without dependency problems, if any one of several mail programs were installed. It pretty clearly states that it is not to be used for masquerading as a different package, which was my point. I wasn't making any claims about wether it was technically possible to do so or not, just what it's intended purpose was.

Look more closely at the docs for "Obsoletes" in RPM, not just those for "Provides". Being able to transparently replace an existing package with a renamed one that installs files with the same names is certainly part of the purpose/capabilities of the RPM dependency machinery (i.e. precisely the distribute vs setuptools situation). We may want to clarify the wording to ensure it is clear that the provision of the dist name (as posted on PyPI) is implied, though. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On Nov 20, 2012 11:45 AM, "Donald Stufft" <donald.stufft@gmail.com> wrote:

On Monday, November 19, 2012 at 7:37 PM, PJ Eby wrote:
Can we maybe kill Provides-Dist and its associated baggage first, though?
I would love to kill Provides-Dist. The biggest question there is how do you handle it's functionality? If someone needs setuptools but they have distribute installed they both shouldn't get installed. The need for it for the "2 packages being distributed together" I'm (personally) less concerned about since with proper dependency data we should be able to just depend on things instead of bundling them.

The "I bundled a renamed copy of six" is a totally different case which would not invoke provides-dist. "I merged sqlalchemy with a previously separate but wildly popular declarative / database support / whatever extension" would invoke provides-dist. Daniel Holth On Nov 19, 2012, at 7:41 PM, Donald Stufft <donald.stufft@gmail.com> wrote:

We are getting along fine too. No tool parses metadata 1.x for package management reasons and provides has existed forever with no implementation. So it is not inconveniencing anyone. I would prefer to leave it alone. Daniel Holth On Nov 19, 2012, at 7:49 PM, Donald Stufft <donald.stufft@gmail.com> wrote:

So you want to leave metadata in that you think people shouldn't implement? Or you do think people should implement it and the point about it existing forever without an implementation is? At the very least there needs to be some sort of guidelines as to what to do with the field in the various states it could be in. On Monday, November 19, 2012 at 8:31 PM, Daniel Holth wrote:

On Monday, November 19, 2012 at 9:24 PM, Daniel Holth wrote:
Mostly it seems a bit silly to have so much conversations about parts of the pep that remain unchanged from previously accepted versions...
Well, I think the PEP should describe what we expect to be implemented *shrug*. Either we should expect it to be implemented and it should be part of the spec, or we shouldn't expect people to implement it and it should be removed.

The section could definitely be much clearer. How about: Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. This field *must* include the project identified in the ``Name`` field, optionally followed by the version: Name (Version). A distribution may provide additional names, e.g. to indicate that multiple projects have been merged into a single distribution or to indicate that this project is a substitute for another. For instance distribute (a fork of setuptools) could ``Provides-Dist`` setuptools to prevent the conflicting package from being downloaded and installed when distribute is already installed. A distribution that has been merged with another might ``Provides-Dist`` the obsolete name(s) to satisfy any projects that require the obsolete distribution's name.

Daniel Holth <dholth <at> gmail.com> writes:
Mostly it seems a bit silly to have so much conversations about parts of the pep that remain unchanged from previously accepted versions...
I don't agree with the suggestion that we shouldn't discuss it because it was accepted in a previous version. Perhaps it didn't receive the right scrutiny at that time, but since it hasn't been implemented, it's reasonable to discuss it. ISTM that implementing it as suggested in the PEP can lead to certain problems, since it is a multi-valued field. If it is left in, then something should be said in the PEP about the potential difficulties and if/how they can be resolved. The difficulties I am talking about relate to dependency resolution. Given the current definition of Provides-Dist, it is possible for a package A on PyPI to "Provide" all of e.g. "A (1.0)", "B (1.2)" and "C (1.5)", and it is also possible for packages B and C on PyPI to provide the same (or slightly different) versions of logical packages of A, B, and C. This will likely lead to the need for a sophisticated dependency resolver because the dependency graph can get quite convoluted. (Remember, we might need to do this resolution when removing packages as well as when installing them.) I know there are SAT solvers and such, but I'm not sure we need that level of sophistication, or whether its complexity cost is outweighed by any benefit. Remember, we are managing fine without multi-valued Provides-Dist, and while a case has been made for virtual packages and forks (which just require a single-valued field), no compelling case has been made for bundling packages in general (I understand that such requirements might sometimes arise in certain corporate environments, but they don't seem to be a mainstream use case). Hence, no strong case has been made for a multi-valued "Provides" field. If we have a good index and packaging infrastructure, there is no general need for packages to bundle other packages, unless those bundled packages are changed in some way to suit the bundler's needs. In that case, I don't know how you could be sure that a bundled "A (1.0)" hasn't diverged from the equivalent package on PyPI. The "Provides" seems essentially useless in a metadata index, since if, when asked to install D which has a dependency on A, you would download and install A to resolve it rather than B or C, and I can't see when you would want to query the index to say "who provides A?" and then use some heuristic to pick e.g. B or C, rather than A. distlib currently contains support for the multi-valued "Provides", but I'm not confident that will work as expected given pathological cases like the example I suggested, without getting "complicated" in the Zen of Python sense. I'm not convinced that the maintenance burden of a complicated solution is worth the heretofore unnecessary ability to bundle stuff in arbitrary ways. Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 9:35 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
If you don't have Provides-Dist, then distribute must continue to bundle an extra .egg-info directory to emulate the feature. This is more than enough justification for me. Name: is essentially an alias for Provides-Dist: (or vice-versa) so there is no such thing as a single-valued Provides-Dist. Having two names for a package is just as complicated as having twenty. You should not implement Provides-Dist by searching for every Provides-Dist: name on PyPI. You should only use it when deciding whether to download setuptools when distribute is already installed and a package depends on setuptools. The bundling term was bad wording on the part of the PEP. No one should ever include non-renamed copies of other dists in their dists "import six" vs. "import django.util.six". I've suggested a new wording in this thread.

Daniel Holth <dholth <at> gmail.com> writes:
I'm not so sure. In the case of two names, it could be assumed that one was a fork of the other (as in the specific cases of distribute/setuptools, or PIL/ Pillow). You cannot reasonably make this assumption if you have twenty entries in your Provides-Dist.
You should not implement Provides-Dist by searching for every Provides-Dist: name on PyPI.
I wasn't seriously suggesting that this approach be taken - merely pointing out that Provides-Dist isn't of much use in a metadata index.
So apart from the setuptools/distribute and PIL/Pillow scenarios, what are the scenarios where you would have 3 or more values in Provides-Dist? If they are e.g. a bundled SQLAlchemy, why would that be preferable to an entry in Requires-Dist? Regards, Vinay Sajip

On Wed, Nov 21, 2012 at 1:16 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Provides/Requires/Obsoletes are *not* for bundling. Publishing bundled packages on the index is bad, and people shouldn't do it. What they're for is tracking name changes over time, so that you can fork and rename and merge projects without breaking the world for people that depend on your projects (one example used in the Fedora RPM docs is the apache package being renamed to httpd: https://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html-sin... ). The fact distribute can provide setuptools and Pillow can provide PIL are examples of the simple fork/rename case - they're designed to be drop in replacements for the projects they forked, so it's appropriate for them to advertise that fact in a way the deployment tools can understand. The multi-value support is then needed if you have multiple name changes over time (e.g. if someone were to create a distribute2 that provided both distribute and setuptools), or if you merge two projects together (e.g. if a popular extension to a project was folded into the main distribution for that project). It's likely fine if an installer doesn't use sophisticated graph analysis to find the "best" way to satisfy a set of requirements - you can just as easily use it in the simple way Daniel describes of only using these fields to check for existing locally installed packages with the necessary capabilities, before going out to get whatever is missing from the package index based purely on the distribution names. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Edit the following text: Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. This field *must* include the project identified in the ``Name`` field, optionally followed by the version Name (Version). A distribution may provide additional names, e.g. to indicate that multiple projects have been merged into a single distribution or to indicate that this project is a substitute for another. For instance distribute (a fork of setuptools) could ``Provides-Dist`` setuptools to prevent the conflicting package from being downloaded and installed when distribute is already installed. A distribution that has been merged with another might ``Provides-Dist`` the obsolete name(s) to satisfy any projects that require the obsolete distribution's name.

Daniel Holth <dholth <at> gmail.com> writes:
Edit the following text:
Okay, here is a possible version: --------------------------------- Provides-Dist (multiple use) Each entry contains a string naming a requirement that is satisfied by installing this distribution. The entry must consist of a name and version. This name of the project identified in the ``Name`` field is implicitly considered as provided, with the version specified in the ``Version`` field. The use of multiple names in this field *must not* be used for bundling distributions together. It is intended for use when projects are forked and merged over time, while providing essentially the same function. Multiple names reflect the evolution of the project over time and not the bringing together of different packages, already distributed elsewhere, in a bundle. Thus, the 'distribute' distribution, a fork of setuptools, could say that it ``Provides-Dist`` a particular version of setuptools, to prevent setuptools from being downloaded and installed when distribute is already installed. If, over time, distribute evolved into a new package called 'distribute2' (for argument's sake), then that could say that it ``Provides-Dist`` a specific version of distribute and a specific version of setuptools. ----------------------------------- Some comments on the above: I'm not entirely comfortable with a Provides-Dist entry which does not specify a version, since it does not allow to you to test that a requirement is actually met. So, I've removed the "optional" qualification from the version. Also: what happens when a requirement is for setuptools (>= X.Y), but the distribute fork hasn't kept pace, and so only supports setuptools at a lower version than X.Y? I take it we're entirely comfortable with installing setuptools X.Y in that case? How would you ensure the right setuptools is always loaded, since presumably both are on sys.path? Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 11:49 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Egg-based tools don't have any problem with this, since they set sys.path to include the eggs needed for the running program. Other tools will have to tell the user and let them work it out, e.g. by using a different virtualenv. I personally don't think that forks claiming to "provide" something is really a good thing to encourage; ISTM that saying a package *conflicts* with another is more accurate, e.g. distribute Conflicts-Dist setuptools. I also think distributions should say they are obsoleted, rather than allowing other distributions to obsolete them. That is, centralized packaging systems rely on a central authority to resolve issues of who provides what and obsoletes what; there's an implicit "x obsoletes y [by decree of semi-independent third-party z]". However, in Python package metadata, it's "x obsoletes y [by decree of x]". IMO, this should be reversed to, "Y is obsoleted by x [by decree of y]", and "installing Y will conflict with X [by decree of X]", so that in each case the scope of authority for the statement is clear. That is, in each case (conflict or obsolescence), the project's developers are declaring under what conditions they will not be supporting an installation. In the case of obsolescence, the developer is saying, "this is being phased out, you should use that other thing instead". In the case of forks, the developer is saying, "If you install both versions, something's gonna break." Note that installation conflict is a more conservative claim anyway: a conflict between forked "foobar" packages is permanent, in the sense that it doesn't matter what versions of both packages you're interested in: they both want to install a foobar/__init__.py. (Of course, installers can and should detect that condition automatically, but not until they download the package first.)

On Tue, Nov 20, 2012 at 4:07 PM, Glenn Linderman <v+python@g.nevcal.com>wrote:
(We've been over this before, the last time this discussion came up on the Distutils-SIG for a previous Metadata PEP a year or two back, but here goes....) Obsoleting a package is for handling renames and support transitions. For example, if it actually did anything to do so, I'd mark RuleDispatch as obsoleted-by PEAK, the Pylons folks might mark some version of that as obsoleted-by Pyramid, etc. To put it another way, marking a package obsolete is part of deprecation and replacement, not an unsubstantiated third-party claim about the maintenance status of an unrelated project. If a package is *actually* dead, there's no real point to declaring that something else obsoletes it, and certainly no reason to put it in metadata form. Otherwise, we could have Twisted claiming to obsolete GEvent and vice-versa at the same time. Which one should an installer believe? It makes no sense in a standard where the project's maintainers can say whatever they want about somebody else's project. The scope of authority for automatically-consumed metadata should *only* encompass the project that provided the metadata.

I think the Metadata 1.1 treatment of these concepts is in some ways better. (Metadata 1.2 added the -Dist suffix to the fields in an attempt to make it clear that dependency names are PyPI names and not "import x" names.) http://www.python.org/dev/peps/pep-0314/ says: Provides (multiple use) Each entry contains a string describing a package or module that will be provided by this package once it is installed. These strings should match the ones used in Requirements fields. A version declaration may be supplied (without a comparison operator); the package's version number will be implied if none is specified. Example: Provides: xml Provides: xml.utils Provides: xml.utils.iso8601 Provides: xml.dom Provides: xmltools (1.3) Obsoletes (multiple use) Each entry contains a string describing a package or module that this package renders obsolete, meaning that the two packages should not be installed at the same time. Version declarations can be supplied. The most common use of this field will be in case a package name changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. When you install Torqued Python, the Gorgon package should be removed. Example: Obsoletes: Gorgon They mean pretty much what the same words mean in RPM and do not need further bikeshedding.

Daniel Holth <dholth <at> gmail.com> writes:
They mean pretty much what the same words mean in RPM and do not need further bikeshedding.
But isn't it the case that the scenarios are different because in the case of RPMs, we have a presumed authority which can determine e.g. what obsoletes what, whereas with Python distributions, there's no central authority that has this function? Regards, Vinay Sajip

On Tue, Nov 20, 2012 at 06:43:32PM -0500, Daniel Holth wrote:
Agreed. And this is closer to the way that distributions' tools have to operate than they'd want to :-( Within the distribution we like to pretend that we only need to care about the packages that we generate. But we also know that whether or not we support it, ordinary users will install pacakges from outside of our walls. That means that the packaging tools that we create will need to deal with things that we might not condone within our "presumed authority". We trust that people are going to do more or less the right thing with the tools we offer. Once in a while they don't but by and large they do. -Toshio

On Tue, Nov 20, 2012 at 6:43 PM, Daniel Holth <dholth@gmail.com> wrote:
That's sort of beside the point. The *only* use case which Obsoletes provides over Obsoleted-By is that it allows third parties to unilaterally advertise their forked project as a substitute for the original, and maybe block users from switching back to the un-forked project -- regardless of the status of the original project or the consent of the original project's maintainer. This use case, however, benefits nobody besides the forkers. There are many other legitimate channels by which the forkers can advertise themselves as a replacement for their parent project, and no reason for the installing end user to be bothered with the subject, except in case of a conflict. For somebody obsoleting their own package, on the other hand, it's likely well worth the effort to at least update their PyPI metadata to reflect the change in status -- especially if this can be done through the web interface. It's likely they would wish to update their description as well, to notify human beings of the change. But here's the thing that kills "Obsoletes" dead in the first place as a practical tool: unless installers use a PyPI search before installing *every single project*, there is no way for them to realize that the obsoleting package exists! By contrast, if a package is "Obsoleted-By", then installing that package (or declaring a dependency on it) provides an opportunity to inform the user of the need to make a transition. This can't be done with an "Obsoletes" field. Conversely, if you have already installed a package that says it "Obsoletes" another package, this does *not* tell you that the obsolete package shouldn't still be installed! A replacement project doesn't necessarily share the same API, and may exist in a different package namespace altogether. In short, "Obsoletes" is virtually *useless* as a machine-consumed metadata field, because there is nothing you can actually do with it in a practical installer. I'm against adding more fields to the metadata which do not have a specification for how they should be used in practice; the presence of such fields has been a problem with most of the preceding metadata specs, IMO.
I re-named a package once just because I did not like the name. I used "Obsoletes" for that. It is documentation.
Note that "Obsoleted-By" would also serve that use case, and have the additional benefit of being able to notify people who install new copies of the replaced project. (By the way Daniel, I'm sorry I didn't comment on this PEP sooner; I'd forgotten about the previous PEP 345 rehashing in 2010, or rather, I just sort of assumed that the results of that discussion had been incorporated into the newer PEP, and didn't notice the reappearance of the noise fields until your call for approval just now. Sorry!)

On Wed, Nov 21, 2012 at 1:10 PM, PJ Eby <pje@telecommunity.com> wrote:
Then that's a bug in the metadata of the project misusing "Obsoletes", and should be reported as such. If the new package is not a drop-in replacement, then it has no business claiming to obsolete the other package. I think one of the big reasons this kind of use is rare in the Python community is that project name changes are almost always accompanied by *package* name changes, and as soon as you change the package name, you're changing the public API, and thus it is no longer appropriate to use Provides or Obsoletes, as the renamed project is no longer a drop-in replacement for the original. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Nov 21, 2012 at 1:20 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I realised that my comments above are more about the appropriate use of "Provides", rather than "Obsoletes". For a practically useful "Obsoletes", I think I'm inclined to agree with you, as "Obsoleted-By" provides a way for a maintainer to explicitly declare that a project is no longer receiving updates, and users should migrate to the replacement project if they want to continue to receive fixes and improvements. The current version of "Obsoletes" is, as Daniel describes, really only useful as documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Nov 20, 2012 at 11:01 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
A few more changes to try to address some of the confusion about Requires-Dist: without re-designing the entire requirements system. PEP-426 was written only to add extras support to the format. The other changes, re-writing much of the PEP, have been an unfortunate side-effect. The file format's keys are case-insensitive. The version number should be in PEP 386 form. There are too many non-PEP-386 versions now and in the future to make it a must. Distribution (requirement) names are noted as being distinct from ``import x`` module names. Parenthetical explanation has balanced parens. "bundled" has been struck from the PEP. diff -r 55c706023fa2 -r 026aebf2265d pep-0426.txt --- a/pep-0426.txt Sun Nov 18 19:55:10 2012 +0200 +++ b/pep-0426.txt Mon Dec 03 20:36:13 2012 -0500 @@ -34,9 +34,9 @@ The syntax defined in this PEP is for use with Python distribution metadata files. The file format is a simple UTF-8 encoded Key: value -format with no maximum line length, followed by a blank line and an -arbitrary payload. The keys are case-insensitive. It is parseable by -the ``email`` module with an appropriate ``email.policy.Policy()``. +format with case-insensitive keys and no maximum line length, followed by +a blank line and an arbitrary payload. It is parseable by the ``email`` +module with an appropriate ``email.policy.Policy()``. When ``metadata`` is a Unicode string, ```email.parser.Parser().parsestr(metadata)`` is a serviceable parser. @@ -94,7 +94,7 @@ ::::::: A string containing the distribution's version number. This -field must be in the format specified in PEP 386. +field should be in the format specified in PEP 386. Example:: @@ -283,12 +283,13 @@ Each entry contains a string naming some other distutils project required by this distribution. -The format of a requirement string is identical to that of a -distutils project name (e.g., as found in the ``Name:`` field. -optionally followed by a version declaration within parentheses. +The format of a requirement string is identical to that of a distribution +name (e.g., as found in the ``Name:`` field) optionally followed by a +version declaration within parentheses. -The distutils project names should correspond to names as found -on the `Python Package Index`_. +The distribution names should correspond to names as found on the `Python +Package Index`_; often the same as, but distinct from, the module names +as accessed with ``import x``. Version declarations must follow the rules described in `Version Specifiers`_ @@ -305,7 +306,8 @@ Like Requires-Dist, but names dependencies needed while the distributions's distutils / packaging `setup.py` / `setup.cfg` is run. -Commonly used to generate a manifest from version control. +Commonly used to bring in extra compiler support or a package needed +to generate a manifest from version control. Examples:: @@ -318,17 +320,19 @@ Provides-Dist (multiple use) :::::::::::::::::::::::::::: -Each entry contains a string naming a Distutils project which -is contained within this distribution. This field *must* include -the project identified in the ``Name`` field, followed by the -version : Name (Version). +Each entry contains a string naming a requirement that is satisfied by +installing this distribution. This field *must* include the project +identified in the ``Name`` field, optionally followed by the version: +Name (Version). A distribution may provide additional names, e.g. to indicate that -multiple projects have been bundled together. For instance, source -distributions of the ``ZODB`` project have historically included -the ``transaction`` project, which is now available as a separate -distribution. Installing such a source distribution satisfies -requirements for both ``ZODB`` and ``transaction``. +multiple projects have been merged into and replaced by a single +distribution or to indicate that this project is a substitute for another. +For instance distribute (a fork of setuptools) could ``Provides-Dist`` +setuptools to prevent the conflicting package from being downloaded and +installed when distribute is already installed. A distribution that has +been merged with another might ``Provides-Dist`` the obsolete name(s) +to satisfy any projects that require the obsolete distribution's name. A distribution may also provide a "virtual" project name, which does not correspond to any separately-distributed project: such a name @@ -359,10 +363,9 @@ Version declarations can be supplied. Version numbers must be in the format specified in `Version Specifiers`_. -The most common use of this field will be in case a project name -changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. -When you install Torqued Python, the Gorgon distribution should be -removed. +The most common use of this field will be in case a project name changes, +e.g. Gorgon 2.3 gets renamed to Torqued Python 1.0. When you install +Torqued Python, the Gorgon distribution should be removed. Examples::

On Tue, Nov 20, 2012 at 5:01 PM, Daniel Holth <dholth@gmail.com> wrote:
The problem is that the above *makes no sense*. "Torqued Python" and "Gorgon" are veiled pseudonyms for Twisted and Medusa.... and Twisted is not actually a plug-and-play substitute for Medusa, AFAIK. Can anybody suggest an *actual* use case for "Obsoletes", and explain how it is supposed to work in software? The last time this discussion came up, nobody had any use cases that stood up to the "how's that actually going to work and/or help?" test. Here's a post of mine summarizing this and related points in the previous thread: http://mail.python.org/pipermail/catalog-sig/2010-October/003364.html

On Tue, Nov 20, 2012 at 9:44 PM, PJ Eby <pje@telecommunity.com> wrote:
Again I didn't write any of this. Someone mentioned ZODB + transaction. The PEP should have used the word "merged" instead of "bundled". When two packages become one, and the redundant package is no longer being developed, Provides-Dist can be used. I re-named a package once just because I did not like the name. I used "Obsoletes" for that. It is documentation.

On Wed, Nov 21, 2012 at 12:44 PM, PJ Eby <pje@telecommunity.com> wrote:
Sure. This is an RPM example, but exactly the same thing applies at the Python level. One of the dependencies of PulpDist (a directory mirroring tool I wrote), is the Pulp project (originally just an RPM mirroring tool, but now with plugin-based support for mirroring other things). The upstream version of Pulp that I currently use is missing Kerberos login support, so I have patched that in via RPMs patching features. To avoid messing up others sharing the internal yum repo where this is published, I actually use the Provides/Conflict/Obsoletes features of RPM to make sure my patched and renamed copy and the upstream version don't interfere with each other (and certainly can't be installed on the same system, as they would trample all over each other by attempting to install the same files). Mostly though, these labelling tools are especially useful for internal forks and mergers - the ones you *don't* share with the wider internet, except perhaps in the form of upstream patches (For example: https://bugzilla.redhat.com/show_bug.cgi?id=831937). On a public index, drop-in replacements are *always* going to be controversial from a social point of view, which is why there are only two current examples on PyPI I am aware of (i.e. distribute vs setuptools and Pillow vs PIL). The first was essentially a hostile fork, while the latter started as an attempt to provide decent packaging support when the current maintainer didn't show any interest in doing so. In such cases, it is absolutely essential that the *forking* project is able to declare that it is a replacement for the original project. It is then up to the community to decide whether or not the claims of being a suitable replacement are valid, which will be shown most clearly in relative uptake numbers between the original project and the forked one. I do consider it unfortunate that Python has only copied 3 of the 4 RPM dependency management fields (i.e. only Provides, Requires, and Obsoletes, without copying the more value neutral Conflicts) and I also prefer the "capability" terminology in the Fedora RPM guide that makes it clear that these are really arbitrary strings from a tooling point of view that only match the package name by convention. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say:
(1) Then how *should* the "bundle-of-several-components" case be represented? (2) How is 'Provides-Dist' different from 'Obsoletes-Dist'? The only difference I can see is that it may be a bit more polite to people who do want to install multiple versions of a (possibly abstract) package. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

On Tue, Nov 20, 2012 at 3:58 PM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
The useful way to bundle a bunch of things would be to just include them all in an executable folder or zipfile with __main__.py. PEP 426 and the package database would not get involved. The bundle would be distributed as an application you can download and use, not as an sdist on PyPI. The intent of Provides and Obsoletes is different. Obsoletes would not satisfy a requirement during dependency resolution. The RPM guide explains a similar system: This brings the total to four types of dependencies that the RPM system tracks: - Requires, which tracks the capabilities a package requires - Provides, which tracks the capabilities a package provides for other packages - Conflicts, which describes the capabilities that if installed, conflict with capabilities in a package - Obsoletes, which describes the capabilities that this package will make obsolete Packages advertise this dependency information. Each dependency holds the type, such as requires, a capability, such as a shared library or a package name, and optionally a version number, such as requiring the python package at a version number greater than or equal to 2.2 (python >= 2.2). http://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/RPM_...

On 11/20/12, Daniel Holth <dholth@gmail.com> wrote:
On Tue, Nov 20, 2012 at 3:58 PM, Jim J. Jewett <jimjjewett@gmail.com> wrote:
Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say:
(1) Then how *should* the "bundle-of-several-components" case be represented?
When I look at, for example, twisted, there are some fairly fine distinctions. I can imagine some people wanting to handle each little piece differently, since that is the level at which they would be replaced by a more efficient implementation. That doesn't mean that someone using the default should have to manage 47 separate little packages individually. Also note that ZODB is mentioned as a bundling example in the current (2012-11-14) PEP. What does the PEP recommend that they do? Stop including transaction? Keep including it but stop 'Provides-Dist'-ing it? The current PEP also specifies that "This field must include the project identified in the Name field, followed by the version : Name (Version)." but the examples do not always include version. Why is the MUST there? Is there some way to distinguish between concrete and abstract provisions? For example, if MyMail (2012.11.10) includes 'Provides-Dist: email', does that really get parsed as 'Provides-Dist: email (2012.11.10)'?
The intent of Provides and Obsoletes is different. Obsoletes would not satisfy a requirement during dependency resolution.
The RPM guide explains a similar system:
As best I can understand, Obsoletes means "Go ahead and uninstall that other package." Saying that *without* providing the same functionality seems like a sneaky spelling of "Please break whatever relies on that other package." I'm willing to believe that there is a more useful meaning. I'm also willing to believe that they are logically redundant but express different intentions. The current wording doesn't tell me which is true. (Admittedly, that is arguably an upstream bug with other package systems, but you should still either fix it or explicitly delegate the definitions.) And as long as I'm asking for clarification, can foopkg-3.4 obsolete foopgk3.2? If not, is it a semantics problem, or just not idiomatic? If so, does it have a precise meaning, such as "no longer interoperates with"? And now that I've looked more carefully ... Can a "Key: Value" pair be continued onto another line? The syntax description under "Metadata Files" does not say so, but later text suggests that either leading whitespace or a leading tab specifically (from the example code) will work. (And is description a special case?) Is the payload assumed to be utf8 text? Can it be itself a mime message? Are there any restrictions on 'Name'? e.g., Can the name include spaces? line breaks? Must it be a valid python identifier? A valid python qualname? 'Version' says that it must be in the format specified in PEP 386. Unfortunately, it doesn't say which part of 386. Do you mean that it must be acceptable to verlib.NormalizedVersion without first having to call suggest_normalized_version? 'Summary' specifies that it must be one line. Is there a character limit, or do you just mean "no line breaks"? Do you want to add a "Should be less than 80 characters" or some such, based on typical tool presentation? Would it be worth repeating the advice that longer descriptions should go in the payload, after all headers? (Otherwise, they have to find 'Description' *and* notice that it is deprecated and figure out what to do instead.) Under 'Description', it isn't entirely clear whether what terminates the field. "Multiple paragraphs" suggests that there can be multiple lines, but I'm guessing that -- in practice -- they have to be a single logical line, with all but the first starting with whitespace. Under 'Classifier', is PEP 301 really the current authority for classifiers? I would prefer at least a reference to http://pypi.python.org/pypi?%3Aaction=list_classifiers demonstrating which classifiers are currently meaningful. Under 'Requires-Dist', there is an unclosed parenthesis. Does the 'Setup-Requires-Dist' set implicitly include the 'Requires-Dist' set, or should a package be listed both ways if it is required at both setup and runtime? The Summary of Differences from PEP 345 mentions changes to Requires-Dist, but I don't know what they were -- even the unclosed parentheses seemed the same. The appendix gives code for generating and parsing continuation lines that suggests the continuation whitespace is exactly one tab -- is other whitespace OK too? -jJ

On Tue, Nov 20, 2012 at 7:18 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
ZODB is a bad example. The word "bundling" will be struck from the PEP entirely. Two sdists should not be combined into one sdist when both packages are still being developed. If A and B are merged into a single PyPI package C, and A and B will no longer be developed, then C may Provides-Dist A and B. http://www.python.org/dev/peps/pep-0426/#requires-dist-multiple-use No MUST on the Requires-Dist version. If no version is there, it should satisfy any version requirement. Is there some way to distinguish between concrete and abstract
No.
When I used Obsoletes, it meant "I am no longer developing this other package that is identical to this re-named package". The system of requirements/conflicts (as the RPM system) does not appear to be entirely orthogonal. And now that I've looked more carefully ...
Description (now in the payload, please) is the only field that is commonly multi-line. Any field could continue onto the next line as far as the parser is concerned. It probably would not make sense. Is the payload assumed to be utf8 text? Can it be itself a mime message?
The entire file needs to be utf-8. The payload is assumed to be utf-8 text in this version. Wouldn't a mime message also be utf-8 text? (we wouldn't know what to do with it)
setuptools constrains it to alphanumeric characters. Metadata 1.3 doesn't say. 'Version' says that it must be in the format specified in PEP 386.
It means it is expected to match: http://www.python.org/dev/peps/pep-0386/#the-new-versioning-algorithm expr = r"""^ (?P<version>\d+\.\d+) # minimum 'N.N' (?P<extraversion>(?:\.\d+)*) # any number of extra '.N' segments (?: (?P<prerel>[abc]|rc) # 'a' = alpha, 'b' = beta # 'c' or 'rc' = release candidate (?P<prerelversion>\d+(?:\.\d+)*) )? (?P<postdev>(\.post(?P<post>\d+))?(\.dev(?P<dev>\d+))?)? $""" Please do not enforce this regexp in your Metadata parser.
Non-blank lines that don't start with whitespace are keys. email.parser.Parser() takes care of this in an e-mail-inspired (but not any literal RFC) way. The distutils documentation has guidelines on the short description / summary.
The addition of the allowed "extra" variable in the ; condition is the most significant change.
Any whitespace would work.

Daniel Holth writes:
When I used Obsoletes, it meant "I am no longer developing this other package that is identical to this re-named package".
But as a user I could care less! The authors may care, but I don't care if Torqued "obsoletes" Gorgon, because in using Torqued I'm DTRT'ing even though I don't know it. What I care about is when I'm using Gorgon, and there's something "better" (or worse, "correct") to use in my application. It might be a good idea to have a just-like-Amazon While-This-Package-Is-Great-You-Might-Also-Consider: field. Tongue-in-cheeki-ly y'rs,

On Wed, Nov 21, 2012 at 12:00 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
Hence my suggestion for an Obsoleted-By field, in which Gorgon would be able to suggest alternatives.
Yeah, that's basically what Obsoleted-By is for.

PJ Eby writes:
On Wed, Nov 21, 2012 at 12:00 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:
My bad, my precise intention was to follow up on your idea (which, credit where credit is due, I had *not* hit upon independently). I should have made that clear. (I really shouldn't be answering English email at a Japanese-speaking conference, my brain thinks it knows what it's doing but shirazuni 日 本文化が染み込む....)
Well, Obsoleted-By is pretty strong language for suggesting possible alternatives. But I suspect that few projects would really want to be suggesting competitors' products *or* their own oldie-but-still-goodie that they'd really like to obsolete ASAP (put an Obsoleted-By line in every Python 2 distribution, anyone? :-)

How to use Obsoletes: The author of B decides A is obsolete. A releases an empty version of itself that Requires: B B Obsoletes: A The package manager says "These packages are obsolete: A". Would you like to remove them? User says "OK". On Wed, Nov 21, 2012 at 2:54 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

On Mon, Dec 3, 2012 at 2:43 PM, Daniel Holth <dholth@gmail.com> wrote:
Um, no. Even if the the author of A and B are the same person, you can't remove A if there are other things on the user's system using it. The above scenario does not work *at all*, ever, except in the case where B is simply an updated version of A (i.e. identical API) -- in which case, why bother? To change the project name? (Then it should be "Formerly-named" or something like that, not "Obsoletes".) Please, *please* see the previous Catalog-SIG discussion I linked: this is only one of multiple metadata fields that were thoroughly debunked in that discussion as completely useless for automated dependency management.

On Wednesday, December 5, 2012 at 2:13 AM, PJ Eby wrote:
You can automatically uninstall A from B in an automatic dependency management system. I *think* RPM does this, at the very least I believe it refuses to install B if A is already there (and the reverse as well).* There's nothing preventing an installer from, during it's attempt to install B, see it Obsoletes A, looking at what depends on A and warning the user what is going to happen and prompt it. I think Obsoletes as is an alright bit of information. I think the biggest flaw with Obsoletes isn't in Obsoletes itself, but is in the lack of a Conflicts tag that has the same functionality (minimally refusal to install both, possibly uninstall the previous one with a prompt to the user). Obsoletes has the semantics of a logical successor (typically renames) while Conflicts should have the semantics of a competitor. distribute would conflict with setuptools, foo2 would Obsoletes foo. * I could be wrong about RPM's treatment of Obsoletes
I don't see this in this thread, could you link it again?

On Wed, Dec 05, 2012 at 02:46:11AM -0500, Donald Stufft wrote:
This is correct.
I believe it refuses to install B if A is already there (and the reverse as well).*
I'd have to test this but I believe you are correct about the first. Not sure about the reverse.
In rpm-land, if something depended on A and nothing besides the actual A package provided A, rpm will refuse to install B. But rpm is meant to be used unattended so different package managers could certainly choose to prompt. For package renames, package B would have both an Obsoletes: A <= $OLD_VERSION and a Provides: A = NEW_VERSION -Toshio

On Wed, Dec 5, 2012 at 2:46 AM, Donald Stufft <donald.stufft@gmail.com> wrote:
Unless the user wrote those things that depend on A, they aren't going to be in a position to do anything about it. (Contrast with a distro, where dependencies are indirect - the other package will depend on an abstraction provided by both A and B, rather than directly depending on A *or* B.) (Also note that all the user knows at this point is that the author of B *claims* to obsolete A, not that the authority managing the repository as a whole has decreed B to obsolete A.)
You can automatically uninstall A from B in an automatic dependency management system
My point is that this can only work if the "obsoleting" is effectively just a rename, in which case the field should be "renames", or better still, "renamed-to" on the originating package. As I've mentioned repeatedly, Obsoleted-By handles more use cases than Obsoletes, and has at least one practical automated use case (notifying a developer that their project is depending on something that's obsolete). Also, the example given as a use case in the PEP (Gorgon to Torqued) is not just wrong, it's *actively misleading*. Gorgon and Torqued are transparent renames of Medusa and Twisted, which do not share a common API and thus cannot be used as the subject of any automated processing (in the case of Obsoletes) without doing some kind of PyPI metadata search for every package installed every time a package is installed.
I think Obsoletes as is an alright bit of information.
1. It cannot be used to prevent the installation of an obsolete package without a PyPI metadata search, since you must examine every *other* package on PyPI to find out whether some package obsoletes the one you're trying to install. 2. Unlike RPM, where metadata is provided by a trusted third party, Obsoletes can be specified by any random forker (no pun intended), which makes this information a mere advertisement... and an advertisement to the wrong audience at that, because they must have *already* found B in order to discover that it replaces A! 3. Nobody has yet supplied a use case where Obsoletes would not be strictly improved upon by Obsoleted-By. (Note that "the author of package X no longer maintains it" does not equal "package Y is entitled to name itself the successor and enforce this upon all users" -- this can work in RPM only because it is a third party Z who declares Y the successor to X, and there is no such party Z in the Python world.)
I don't see this in this thread, could you link it again?
http://mail.python.org/pipermail/catalog-sig/2010-October/003368.html http://mail.python.org/pipermail/catalog-sig/2010-October/003364.html These posts also address why a "Conflicts" field is *also* unlikely to be particularly useful in practice, in part for reasons that relate to differences between RPM-land and Python-land. (For example, RPMs can conflict over things besides files, due to runtime and configuration issues that are out-of-scope for a Python installer tool.) While it's certainly desirable to not invent wheels, it's important to understand that the Python community does not work the same way as a Linux distribution. We are not a single organization shipping a fully-functional and configured machine, we are hundreds of individual authors shipping our own stuff. Conflict resolution and package replacement (and even deciding what it is that things "provide" or "require") are primarily *human* processes, not technical ones. Relationship and support "contracts", IOW, rather than software contracts. That's why, in the distro world, a package manager can use simple fields to carry out the will of the human organization that made those support and compatibility decisions. For Python, the situation is a bit more complicated, which is why clear thinking is needed. Simply copying fields blindly from other packaging systems just isn't going to cut it. Now, if the will of the community is to turn PyPI into a distro-style repository, that's fine... but even if you completely ignore the human issues, there are still technical ones. Generally, distro-style repositories work by downloading the full metadata set (or at least an index) to a user's machine. And that's the sort of architecture you'd need in order for these type of fields to be technically feasible (e.g., doing an index search for Obsoletes), without grinding the PyPI servers into dust.

On Wed, Dec 5, 2012 at 4:10 PM, PJ Eby <pje@telecommunity.com> wrote:
My desire is to invent the useful "wheel" binary package format in a reasonable and limited amount of time by making changes to Metadata 1.2 and implementing the new metadata format and wheel in distribute and pip. Help me out by allowing useless but un-changed fields to remain in this version of the PEP. I am done with the PEP and submit that it is not worse than its predecessor. I can participate in a discussion about any of the following: Summary of Differences From PEP 345<http://www.python.org/dev/peps/pep-0345> - Metadata-Version is now 1.3. - Values are now expected to be UTF-8. - A payload (containing the description) may appear after the headers. - Added extra to environment markers. - Most fields are now optional. - Changed fields: - Description - Project-URL - Requires-Dist - Added fields: - Extension - Provides-Extra - Setup-Requires-Dist

On Wed, Dec 5, 2012 at 5:30 PM, Daniel Holth <dholth@gmail.com> wrote:
You could just mark those fields as deprecated and that they should not be used to delete packages or block packages from installation. Justification: nobody has managed to make them work in an automated tool yet, and their use in same is controversial, so they are downgraded to human-informational only. Please, let's not have yet *another* metadata spec that advertises these attractive nuisance[1] fields. I do not want us to be having this same conversation AGAIN the next time any metadata changes are being considered. We've already had it too many times already. PEPs are supposed to summarize these discussions for that very reason. --- [1] For non-native speakers, an attractive nuisance is a dangerous thing that entices unsuspecting persons to play with it; http://en.wikipedia.org/wiki/Attractive_nuisance_doctrine has more details.

On Thu, Dec 6, 2012 at 8:30 AM, Daniel Holth <dholth@gmail.com> wrote:
Agreed. PJE's arguments sound reasonable (especially since Obsoletes doesn't get used much in RPM-land either - Provides & Conflicts are both far more common), but they're orthogonal to the current aims of the metadata 1.3 update. If another author wanted to create a subsequent 1.4 update that was focused on replacing Obsoletes with Obsoleted-By, that would be fine (alternatively, a patch to the current PEP draft may be acceptable, but accepting such a change would be up to Daniel as the PEP author).
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wednesday, December 5, 2012 at 4:10 PM, PJ Eby wrote:
Arguing over Obsoletes vs Renames is a massive bikeshedding argument.
So it's a bad example. Hardly an argument against it.
Will require support from PyPI but this ultimately isn't a big deal.
If you're installing B you've prescribed trust to that author. If you don't trust the author then why are you installing (and then executing) code they wrote.
Very convenient to declare that one of the major use cases for Obsoletes over Obsoleted-By is not valid because of your own personal opinions. Like I said above, if you're installing a package that someone has uploaded you've implicitly granted them trust. There is far worse things that a bad Python citizen can do during, and after and install that what is allowed by Obsoletes.
I don't think Conflicts is something that every single package is going to require. As you said the tools themselves are going to handle the obvious cases for the bulk of situations. Unless you think there are no cases where two packages can conflict in more than what files are going to be installed then there are cases where it would be helpful and merely having the ability to use it when it is the best tool for the job isn't going to cause any great issue.
End systems often times do not have a singular organization controlling every package in their system. The best example is Ubuntu and their PPA's.
This is insane. A fairly simple database query is going to "grind the PyPI servers into dust"? You're going to need to back up this FUD or please refrain from spouting it.

On Dec 05, 2012, at 06:07 PM, Donald Stufft wrote:
What you installed Z, but B got installed because it was a dependency three levels down?
Well, basically never installing anything from PyPI except into a virtualenv is probably a good recommendation (maybe even now).
End systems often times do not have a singular organization controlling every package in their system. The best example is Ubuntu and their PPA's.
Well, PPAs are awesome, but have known and well-publicized trust issues. I wouldn't enable a PPA into my running system without really knowing who the owner is and why I'm using their PPA. Or doing a lot of testing in a chroot first, and probably pinning the package set to just the one(s) from the PPA I care about. Cheers, -Barry

On Wednesday, December 5, 2012 at 6:18 PM, Barry Warsaw wrote:
Sure, you granted trust to Z, Z granted trust to Y, and Y granted trust to B. Like in SSL certificates there was a chain of trust. If you don't trust Z then don't install their package.
A virtualenv only protects you from well behaved packages. There is no way to prevent a package author from doing very nasty things to you if they wish. Providing more power in the metadata doesn't make this situation better or worse, it just makes more standard paths in the cases where you do need to do it.
Basically the same thing can be said about packages on PyPI. All the same trust issues exist there. Simply installing a Python package is already granting far more trust than Obsoletes requires since installing a package is executed someone else's python code on your system. Even if you remove setup.py you're still going to be executing their code on your system. If you do not trust the author of the packages you are installing, you do not install their packages.

I understand the PEP author's frustration with continued discussion, but I think this subthread on Obsoletes vs. Obsoleted-By is not mere bikeshedding on names. It matters *which package* presents the information. Donald Stufft writes:
The author may be a genius when it comes to writing code, and an idiot when it comes to distributing it. Distribution is much harder than it looks, as you know. Trusting the author's *content* and trusting the author's *metadata* are not equivalent! As far as I can see, the semantics of putting "Obsoletes: A" into B without changing A are the same as the semantics of putting "Provides: A" into B (without changing A).[1] Only if A includes "Obsoleted-By: B" can a user be confident that B is a true successor to A. Furthermore, as has been pointed out, the presence of "Obsoleted-By" in A has the huge advantage of informing users and developers of dependent packages alike that A is obsolete when they try to update A. If A is not changed, then an attempted update will tell them exactly that, and they may never find out about B. But if A is modified in this trivial way, the package system can automatically inform them. This is also trivial, requiring no database queries. "Simple is better than complex." Footnotes: [1] A trustworthy author of B wouldn't use "Provides" unless he thought B was indeed a drop-in, and presumbly superior, replacement for A. And that's all that "Obsoletes" can tell you!

Makes sense. How about calling it Replacement. 0 or 1? Replacement (optional) :::::::::::::::::::::: Indicates that this project is no longer being developed. The named project provides a drop-in replacement. A version declaration may be supplied and must follow the rules described in `Version Specifiers`_. The most common use of this field will be in case a project name changes. Examples:: Name: BadName Replacement: AcceptableName Replacement: AcceptableName (>=4.0.0)

On 12/5/2012 10:12 PM, Daniel Holth wrote:
I like it. 'Replacement' is broader in meaning, more neutral, and less awkward than 'Obsoleted-by'. And I agree that A users have much more need to know about B the vice-versa. It is much the same situation with Py 2 and Py 3 (although the latter is *not* a drop-in replacement). -- Terry Jan Reedy

On Thu, Dec 6, 2012 at 1:12 PM, Daniel Holth <dholth@gmail.com> wrote:
Makes sense. How about calling it Replacement. 0 or 1?
Hah, you'd think I'd have learned by now to finish reading a thread before replying. It will be nice to get this addressed along with the other changes :) (FWIW, Conflicts and Obsoletes are messy in RPM as well, and especially troublesome as soon as you start enabling multiple upstream repos from different providers. The metadata problem is handled by prebuilding indices when the repo changes, but that's still more work for the server, and more work for clients)
Replacement (optional) ::::::::::::::::::::::
I like verb forms like Obsoleted-By or Replaced-By, as the noun form is ambiguous about the direction of the change. Since the field being replaced is Obsoletes, Obsoleted-By makes sense.
Indicates that this project is no longer being developed. The named project provides a drop-in replacement.
Typically, the new version *won't* be a drop-in replacement (e.g. you'll likely at least have to import from a different top level package). Instead, the field would more often be used as an explicit indicator that the project is no longer receiving updates, as the *development team* has moved on, so users may want to consider either migrating, taking over development (if the former developers are amenable) or forking. If the replacing project *is* a drop-in replacement for the old project, then it should also advertise a Provides-Dist for the original project. Automated tools can then easily detect the two cases: A Obsoleted-By-Dist B and B Provides-Dist A = A is defunct, and B should be a drop-in replacement for A A Obsoleted-By-Dist B (without a Provides-Dist on B) = A is defunct, B is a replacement for A, but some porting will be needed Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Dec 6, 2012 at 2:54 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Although Replaced-By would be fine as well - it's certainly much easier to say than the mouthful that is Obsoleted-By. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Dec 5, 2012 at 6:07 PM, Donald Stufft <donald.stufft@gmail.com> wrote:
Arguing over Obsoletes vs Renames is a massive bikeshedding argument.
And is entirely beside the point. The substantive question is whether it's Obsoletes or Obsoleted-By - i.e., which side is it declared on.
So it's a bad example. Hardly an argument against it.
Nobody has actually proposed a better one, outside of package renaming -- and that example featured an author who could just as easily have used an obsoleted-by field.
Will require support from PyPI but this ultimately isn't a big deal.
...and every PyPI clone. And of course the performance issues.
Trusting their code is one thing; trusting whether they understood a PEP (and its interactions with various installation tools) well enough to not accidentally delete *somebody else's code* out of my system is another thing altogether. OTOH, trusting an author to tell me (in an automated fashion), "hey, you should switch to this other thing as soon as you can" is a FAR smaller amount of required trust. Arguing that because I have to trust one thing, means I must trust another, is a "Fallacy of Gray" argument.
I didn't say it was invalid, I said: """Note that "the author of package X no longer maintains it" does not equal "package Y is entitled to name itself the successor and enforce this upon all users""" These things are not equal. AFAIK, well-managed Linux distros do not allow random forkers to declare themselves the official successor to a defunct package, so any analogy between this use case in the Python world and the distro world is strained at *best*.
The rationale for that is laid out in the posts I linked.
then there are cases where it would be helpful
Please, present a *real-life instance* where it would have been helpful to you.
and merely having the ability to use it when it is the best tool for the job isn't going to cause any great issue.
One of the posts I linked presents an instance where it would have actually *harmed* things to specify it, and it's quite easy to see how the same problem would arise if used for non-file-related conflicts... And the problem present is *directly* tied to the lack of a third-party Z who decides whether X and Y, as configured for release Q of distro P, "conflict". This is not a problem that is solvable even in *principle* for an automated tool in the absence of party Z, which means that any such field's actual function is limited to a heads-up to a human user.
I take it you're not familiar with PyPI's history of performance and scaling problems over the last several years, then. The statically cached "/simple" index was developed precisely to stop *today's* class of installation tools from killing the servers... and then mirroring PyPI was still required to scale. Any proposal that calls for encouraging tools to query a metadata field *every time* a package is installed (or even just downloaded) almost certainly needs to be vetted with the PyPI admin team.

On Wed, Dec 05, 2012 at 07:34:41PM -0500, PJ Eby wrote:
How about pexpect and pextpect-u as a better example?
Note that although well-managed Linux distros attempt to control random forking internally, the distro package managers don't prevent people from installing from third parties. So Ubuntu PPAs, upstreams that provide their own rpms/debs, and major third party repos (for instance, rpmfusion as an add-on repo to Fedora) all have and sometimes (mis)use the ability to Obsolete packages in the base repository. So Donald isn't stretching the relationship quite as far as you make it out. The ecosystem of packages for a distro carries uncontrolled packages just as much as pypi.
And the same for Provides. (ie: latest foo is 0.6c; bar Provides: foo-0.6d. an automated tool that finds both foo and bar in its dep tree can choose to install bar and not foo.) The ability for this class of fields to cause harm is not, to me, a compelling argument not to include them. It could be an argument to explicitly tell implementers of install tools that they all have caveats when used with pypi and similar unpoliced community package repositories. The install tools can then choose how they wish to deal with those caveats. Some example strategies: choose to prompt the user as to which to install, choose to always treat the fields as human-informational only, mark some repositories as being trusted to contain packages where these fields are active and other repositories where the fields are ignored. -Toshio

On Thu, Dec 6, 2012 at 1:49 AM, Toshio Kuratomi <a.badger@gmail.com> wrote:
Perhaps you could explain? I'm not familiar with those projects.
But in each of these cases, the packages are being defined *with reference to* some underlying vision of what the distro (or even "a distro") is. An Ubuntu PPA, if I understand correctly, is still *building an Ubuntu system*. Python packaging as a whole lacks such frames of reference. A forked distro is still a distro, and it's a fork *of something*. Rpmfusion is defining an enhanced Fedora, not slinging random unrelated packages about. If there's a distro analogy to PyPI, it seems to me that something like RpmFind would be closer: it's just a free-for-all of packages, with the user needing to decide for themselves whether installing something from a foreign distro will or won't blow up their system. (E.g., because their native distro and the foreign one use a different "provides" taxonomy.) RpmFind itself can't solve anybody's issues with conflicts or obsoletes; all it can do is search the data that's there. But unlike PyPI, RpmFind can at least tell you which vision of "a distro" a particular package was intended for. ;-)
The ability for this class of fields to cause harm is not, to me, a compelling argument not to include them.
But it is absolutely not a compelling argument *to* include them, and the actual arguments for them are pretty thin on the ground. The real knockdown is that in the PyPI environment, there aren't any automated use cases that don't produce collateral damage (outside of advisories about Obsoleted-By projects).
AFAIK, there are only a handful of curated repositories: Scipy, Enthought, and ActiveState come to mind. These are essentially "python distros", and they might certainly have reason to build policy into their metadata. I expect, however, that they would not want the *package* authors declaring their own conflicts or obsolescence, so I'm not sure how the metadata spec will help them. Has anyone asked for their input or experience? It seems pointless to speculate on what they might or might not need for curated distribution. (I'm pretty sure Enthought has their own install tools, not sure about the other two.)
A peculiar phenomenon: every defense of these fields seems to refer almost exclusively to how the problems could be fixed or why the problems aren't that bad, rather than *how useful the fields would be* in real-world scenarios. In some cases, the argument for the fields' safety actually runs *counter* to their usefulness, e.g., the fields aren't that bad because we could make them have a limited function or no function at all. Isn't lack of usefulness generally considered an argument for *not* including a feature? ;-)

On Fri, Dec 07, 2012 at 01:18:40AM -0500, PJ Eby wrote:
pexepect was last released in 2008. Upstream went silent with unanswered bugs in its tracker and no mailing list. A fork of pexpect was created that addressed the issue of unicode type in python2, a python3 port, and has slowly evolvd since then. I see that the original upstream has made some commits to their source repository since the fork was created although there has still been no new release.
Uhm.... that's both true and false as any complex system is. rpm and deb are just packaging formats. So: *) Not all packages built build on top of that system. There are rpm packages provided by upstreams that users attempt (to greater and lesser degrees of success) to install on SuSE, RHEL, Fedora, Mandriva, etc. There are debs built for Ubuntu that people attempt to install onto Debian. *) PPAs and rpmfusion may both build on top of an existing system but they can change the underlying structure, replacing components that other pieces of the base system depend on. You talk about the setuptools and distribute problem on pypi.... there's absolutley nothing that prevents someone from building a PPA or a package in a third-party rpm repository that packages a setuptools that Obsoletes: distribute or a distribute package that Obsoletes: setuptools.
If you constantly forget why the fields are useful, then I suppose you'll always believe that :-) -Toshio

On Fri, Dec 7, 2012 at 12:01 PM, Toshio Kuratomi <a.badger@gmail.com> wrote:
And what problem are you saying which fields would have solved (or which benefits they would have provided), for whom? If the packages have files in conflict, they won't be both installed. If they don't have files in conflict, there's nothing important to be informed of. If one is installing pexpect-u, then one does not need to discover that it is a successor of pexpect. If one is installing pexpect, it might be useful to know that pexpect-u exists, but one can't simply discover that from an Obsoletes field on pexpect-u. However, even if one did discover it, this would merely constitute an *advertisement* of pexpect-u's existence, not a *requirement* that it be used in place. A tool cannot know, without other affirmative user action, that it is actually a good assumption to use the advertised replacement. In the distro world, a user has *already* taken this affirmative action by choosing which repository to source packages from, on an implicit contract that this source is up to the job of managing his needs across multiple packages. Or, if they choose to source an off-brand or upstream package, they are taking affirmative action to risk it. In the Python world, there is no notion of a "repository", aside from a handful of managed Python distros, which have their own, distinct packaging methods and distribution tools. So there is no affirmative contract of trust regarding *inter-project* relationships. It is precisely this lack that is why the metadata spec has gone mostly unused since its inception about a decade ago. Nobody really knows what to "provide" or "require", or in what context they would actually be "obsoleting" anything that isn't their own package, or a package they've forked. But if you live mainly in the distro world, this concept seems absurd, and the fields *obviously* useful. But that's because you're swimming in an ocean of context that doesn't exist on dry land. You're saying that *of course* swimming fins are useful... if you live in the ocean. And I, living on dry land, am saying that *sure* they are... but only in a swimming pool or a pond, and we don't have very many of those here in dry Python-land. And the people who run the swimming pools have thoughtfully already provided their own. Do we need to standardize swim fin sizes for people who mostly live on dry land? The flip side of this, btw, is that there's an implicit contract in the Python world that there is generally only "the" package - not "the package as patched and re-packaged by vendors X, Y, and Z". If I install python project foo, version 1.2, I expect it to be the *same* foo-1.2, with the *same metadata*, *no matter where I got it from*. And so, this assumption is our "air" to your "water". We know that pools and ponds (curated Python distros) are different, as an exception to this rule, just as you know that reefs and islands (uncurated repositories, search engines, and upstream-built packages) are different, as an exception to your assumption that "the package I get is intended to play well with everything else in my system." (This of course is why many distro managers are suspicious of language-specific or other sorts of vertical package management tools - they seem as pointless as wheels in the water, solving problems you don't have, and creating new problems for you at the same time. Unfortunately, people on land will keep inventing them, because they have a different set of problems to solve -- some of which are actually created by the ocean-oriented tools. For example, virtualenv and its predecessors were developed to solve the "problem" of a single integrated environment, even though that integrated environment is the "solution" from a distro perspective.)
Sure. But the reference points still exist, and there is a layer of indirection between "packager" and "developer", even in the case where the packager and developer are the same person or organization. In the Python case, there is usually no such indirection, outside of curated systems like SciPy et al. (Even there, most of what third-party packaging is about in the Python world is taking care of binary builds.) Again, it's islands in the ocean vs. pools on land.
At the *same time*? That is, are you saying that there are repositories that contain *self-contained* "Obsoletes"-cycles? (Presumably, there are no end-user sites containing such cycles, if the install tool responds by refusing to install one or by removing the other.)
If you constantly forget why the fields are useful, then I suppose you'll always believe that :-)
I've stated many times that they're useful... in the context of a larger system. Within the distro packaging ecosystem, a package "conflicts", "obsoletes", or "provides" things *relative* to some notion of an installation -- however vague -- that has been selected by an explicit user action (such as choice of basic distro, package manager, and repository). So, despite their framing as binary relationships -- e.g. Obsoletes(predecessor,succesor) -- the *actual* relationship is three-valued: Obsoletes(predecessor, successor, integration-context). The third player in the relationship is whoever *packaged* the project(s) in question... and in the Python world (outside of curated repositories), that packager is *always the original author*. Now, in the case where the packager and author are different, we can talk about such relationships in the same way: binary relationships with an implied third. For example, if SciPy decided at some point to replace NumPy with NumPyPy, it would be more than reasonable to state that Obsoletes(NumPy, NumPyPy, SciPy), even as at the same time, perhaps Enthought has already tried this and decided to go the other way, so that Obsoletes(NumPyPy, NumPy, EnthoughtPD). They use different tools and repositories and thus can imply the third position. In neither case, however is SciPy or Enthought (nor the authors of NumPy or NumPyPy), entitled to declare an Obsoletes relationship with a *true* wildcard for the third position. And so the key distinction between PyPI and the distro world is that *PyPI is not an integration context*. Packages provided by authors do not usually include this type of metadata, unless the author of the package has a specific integration context in mind. So the burden falls to either the repository manager or the user to define these higher-level relationships *within their intended integration context* (Or to put it another way, *somebody* has to be the "packager", not just the "developer".) Currently, Python distribution tools, culture, and methodology do not have any precedent for the metadata spec contents to be overrridden by a third-party packager, curator or repository manager, in the way that is normal and common in the distro world. (Try to imagine a Linux distro where this kind of information was *always* put in "upstream", because *there is no such thing* as "downstream". That's what it's like "on land".) This is why I keep saying that blind copying is an invitation to trouble, and that clear thinking about the actual requirements is needed. I would not object to explicitly three-way versions of these fields (requires, provides, conflicts, obsoletes) that define a specific integration context in which the statement applies. (Although defining how to name integration contexts would present a *new* challenge for discussion!) Likewise, I would not object to discussion of how to manage metadata for *repackaging* of Python projects by third-party curators (e.g. SciPy et al), and ways to keep that separate from the author's declarations. Or discussion of what should constitute a "repository" in the Python world, as opposed to what we have now (which apart from curated distributions, consists mainly of indexes, not true repositories in the distro sense). Today, however, there is no separation in the metadata spec (or tools) between "packaging" (in the sense understood by distros) and "distributing" (in the sense normally applied to Python packages distributed via PyPI and similar channels). And "packaging" in the distro sense is all about *integrating* packages, not merely making them *available* for others to integrate. That's the critical difference between the two, and in the resulting use cases for the metadata spec.

On Sat, Dec 8, 2012 at 8:02 AM, PJ Eby <pje@telecommunity.com> wrote:
To strain the analogy, the main value of these fields exists on the beach: at the point where you need to impedance match between the Python community and another packaging community. The ideal is to be able to get a point where you can point an automated tool at a project on PyPI and say "give me that, only packaged as RPM/deb/whatever, with appropriate distro specific metadata". Such a tool is obviously going to be distro specific, since it is going to have to do some remapping based on file names to pick up existing distro packages, but it *also* needs upstream metadata. Even in a distro, a "Conflicts:" field often *does* denote runtime conflicts (e.g. over a particular network port), because, as you say, filesystem level conflicts will usually be picked up automatically. The distro philosophy is to say "OK, we simply won't let you install conflicting projects at the same time, so you won't be surprised later by a conflict that only shows up if you happen to run them both at the same time". It's designed to turn a complex, hard to debug, problem into a simple, explicit error at installation time. People build complex systems (especially web apps) based on the PyPI ecosystem, and the upstream communities *can* assist in flagging potential issues in advance. If people start putting bad metadata in their projects, then that's just a bug to be dealt with like any other. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 07, 2012 at 05:02:26PM -0500, PJ Eby wrote:
In the specific case of pexpect and pexpect-u, the files don't actually conflict. The pexpect package includes a "pexpect.py" file, while pexpect-u includes a "pexpect/" directory. These conflict, but not in the easily detectable sense. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On Sun, Dec 9, 2012 at 10:38 PM, Andrew McNabb <amcnabb@mcnabbs.org> wrote:
Excellent! A concrete non-file use case. Setuptools handles this particular scenario by including a list of top-level module or package names, but newer tools ought to look out for this scenario, too.

Donald Stufft <donald.stufft <at> gmail.com> writes:
Never mind the "Obsoletes" information - even the more useful "Requires-Dist" information is not exposed via PyPI, even though it appears to be stored in the database. (Or if it is, please point me to where - I must have missed it.) Even if this were to be made available, it's presumably obtained from PKG-INFO. As I understand, this data is not considered reliable - for example, pip runs egg_info on downloaded packages to get updated information when determining dependencies to be downloaded. If the Requires-Dist info in PKG-INFO can't be relied on, surely less critical information such as Obsoletes can't be relied on, either? Regards, Vinay Sajip

On Thursday, December 6, 2012 at 6:28 AM, Vinay Sajip wrote:
Requires-Dist doesn't exist for more than a handful of packages. But PyPI exposes it via the XMLRPC API, possibly the JSON api as well.
pip runs egg_info because setuptools does not write out to PKG-INFO what the dependencies are (it does write it out to a different text file though). But IIRC that text file is not guaranteed to exist in the distribution. There's also the history where pip was trying to preserve as much backwards compat with easy_install as it could, and if you used the file that egg_info writes out then you'll only get the requirements for the system that the distribution was packaged on. Any if statements that affect the dependencies won't be in effect.

On Thu, Dec 6, 2012 at 6:33 AM, Donald Stufft <donald.stufft@gmail.com>wrote:
It will be Obsoleted-By:. The "drop in replacement" requirement will be removed. The package manager will say "you are using these obsolete packages; check out these non-obsolete ones" but will not automatically pull the replacement without a Requires tag. I will probably add the unambiguous Conflicts: tag "uninstall this other package if I am installed". Many packages (IIRC more than half) have the pre-Metadata-1.2 equivalent of Requires-Dist: which is the very easy to parse requires.txt. This information is not reliable because it could depend on conditions in setup.py. Someone should write a setup.py compiler that determines whether a package's requirements are conditional or not. Environment markers (limited Python expressions at the end of Requires-Dist lines) attempt to make Requires-Dist reliable. You can execute them safely in your environment to determine whether a requirement is right for you: Requires-Dist: pywin32 (>1.0); sys.platform == 'win32' The wheel implementation makes sure all the metadata (the .dist-info directory) is at the end of the .zip archive. It's possible to read the metadata with a single HTTP partial request for the end of the archive without downloading the entire archive.

Daniel Holth <dholth <at> gmail.com> writes:
Sounds good, but can you point to any example code which does this? As I understand it, for .zip files you have to read the last part of the file to get a pointer to the directory, then read that to find where each file in the archive is, then seek to a specific position to read the file contents. Regards, Vinay Sajip

On 6 Dec, 2012, at 15:58, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Because zipfiles can be appended to other files (for example when creating a self-extracting archive) the zipfile module maintains the file offset of the start of a zipfile. The code in the stdlib doesn't appear to test that the zipfile is at a positive offset in the file, therefore with some luck the following will work: * Download the last 10K of the archive (adjust the size to taste, it should be large enough to contain the zipfile directory and the file you are trying to read) * Create a zipfile.ZipFile * Read the zipfile member. If that doesn't work you'll have to create a temporary file of the right size and place the downloaded bit at the end of that file. BTW. Another (more hacky) alternative is to place the interesting bits of dist-info at the start of the zipfile, then you only need to download the first bit of the archive and can then extract the bits you need by parsing the local file headers (zipfiles contain both a directory at the end of the zipfile and a local header stored just before the file data). Ronald

On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
You have to make a maximum of 3 requests: one for the directory pointer, one for the directory, and one for the file you want. It's not particularly difficult to make an HTTP-backed seekable file object to pass to ZipFile() for this purpose but I don't have an example. Normally the last few k of the file will contain all 3 pieces. 8k or 16k would be a good guess.

Daniel Holth <dholth <at> gmail.com> writes:
I don't need an example for doing it with multiple HTTP requests. I only asked for an example because you said one could read the metadata "with a single HTTP partial request", and I couldn't see how it could always be done with a single request. PEP 427 is mute on the subject of zip file comments in a .whl, but perhaps it shouldn't be. IIUC, the directory of the zip file *could* be further from the end of the file by more than 16K, due to the possible presence of a pathologically large comment in the end record. Regards, Vinay Sajip

On Thu, Dec 6, 2012 at 11:30 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
It's just a "usually works" optimization that might be fun when bandwidth is more important than round trip times. The distance between the directory and the end of the file depends on the size of the directory. Django's is an extreme case at nearly half a meg; most are much smaller. On many filesystems it is cheap to create a sparse file the size of the entire archive and write the partial requests into it. The OS doesn't actually store all the 0's. The other reason wheel puts the metadata at the end is so the metadata can be re-written efficiently without re-writing the entire zipfile. The wheel project implements ZipFile.pop() which truncates the last file from a (normal) zip archive. This is especially useful when the last file is the attached digital signature.

On Thu, Dec 6, 2012 at 9:58 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
ISTR that this is especially true for zipimport: I think it depends on a zipfile signature being present at the *end* of the file. Certainly, the standard for .exe and shell wrappers for zipfiles is to place them at the beginning of the file, rather than the end.

On Thu, Dec 6, 2012 at 8:39 AM, Daniel Holth <dholth@gmail.com> wrote:
Sounds fine to me.
I will probably add the unambiguous Conflicts: tag "uninstall this other package if I am installed".
Please don't. See my lengthy posts from the previous PEP 345 retread discussion for why, or ask MRAB to succinctly summarize them as he did so brilliantly with the obsoletes/obsoleted-by issue. ;-) I'll take a stab at a short version, though: a conflict (other than filename conflict) is not an installation-time property of a single project, but rather a *runtime* property of an overall system to which the projects are being installed, including configuration that is out of scope for a Python-specific installation tool to manage. In addition, even declaring overall conflicts as a *mere shorthand* for an existing file conflict creates the possibility of stale conflict information! For example, RuleDispatch vs. PyDispatcher: at one time both provided a "dispatch" package, but if RuleDispatch declared PyDispatcher conflicting, the declaration would quickly have become outdated: PyDispatcher soon renamed its provided package to resolve the conflict. A file-based system can both detect and resolve this conflict (or lack thereof) automatically, whereas a manual "Conflicts" notation must be maintained by the author(s) of one or both packages and removed when out of date. In effect, a "conflicts" field actually *creates* conflicts and maintenance burdens where they did not previously exist, because even after the conflict no longer really existed, an automated tool would have prevented PyDispatch from being installed, or, per your suggestion above, unnecessarily *uninstalled* it after a user installed RuleDispatch. And unlike the Obsoletes->Obsoleted-By change, I do not know of any similar way to salvage the idea of a Conflicts field, without reference to some mediating authority that manages the information on behalf of an overall system into which the projects are being fitted. But in that case, neither of the projects really owns the declaration - it's more like Zope (say) would need a list of plugins that conflict with each other, or they could declare that they conflict when activated in the same instance. A generic Python installer, however, that doesn't know about Zope instances or Apache vhosts or Django apps or any other "environment of conflict", can't assume that *mere installation* constitutes a conflict! It doesn't know, for example, whether code from two simultaneously-installed packages will ever even be *imported* in the same process, let alone whether their specific conflicting features will be used in that process. This effectively ensures that in general, Python installation tools can *only* rely on file-based conflicts as being denotable by project metadata -- and even then, it's better to stick with *actual* file conflicts rather than predicted ones, to avoid the type of logjam described above. P.S. Sorry once again to drag you through all this at the last minute; I just sort of assumed you picked up where Alexis left off on the previous attempt at an update to PEP 345 and didn't pay close enough attention to earlier drafts.

On Fri, Dec 7, 2012 at 3:47 PM, PJ Eby <pje@telecommunity.com> wrote:
That's not what a Conflicts field is for. It's to allow a project to say *they don't support* installing in parallel with another package. It doesn't matter why it's unsupported, it's making a conflict perceived by the project explicit in their metadata. Such a field is designed to convey information to users about *supported* configurations, regardless of whether or not they happen to work for a given use case. If a user believes a declared conflict is in error, and having the two installed in parallel is important to them, they can: 1. Use virtual environments to keep the two projects isolated from each other 2. Use an installer that ignores Conflicts information (which will be all of them, since that's the status quo) 3. Make their case to the upstream project that the conflict has been resolved, and installing the two in parallel no longer causes issues Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 7, 2012 at 8:33 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That's not what a Conflicts field is for. It's to allow a project to say *they don't support* installing in parallel with another package.
If that's the actual intended use case, the PEP needs some revision. In particular, if there's a behavioral recommendation for installer tools, it should be to avoid installing the project that *declares* the conflict, rather than the one that is the object of that declaration. ;-) In any case, as I said before, I don't have an issue with the fields all being declared as being for informational purposes only. My issue is only with recommendations for automated tool behavior that permit one project's author to exercise authority over another project's installation. If the fields are defined in such a way that an author can only shoot *themselves* in the foot with a bad declaration, that's fine by me. So if package A includes a "Conflicts: B" declaration, I recommend the following: * An attempt to install A with B already present refuses to install A without a warning and confirmation * An attempt to install B informs the user of the conflict, and optionally offers to uninstall A In this way, any collateral damage to B is avoided, while still making the intended "lack of support" declaration clear. How does that sound?

On Sat, Dec 8, 2012 at 4:46 PM, PJ Eby <pje@telecommunity.com> wrote:
No, that's not the way it works. A conflict is always symmetric, no matter who declares it. The beneficiary of these notifications is the aggregator attempting to build a systematically coherent system, rather than one with latent incompatibilities waiting to bite them at run time. It doesn't *matter* if "A conflicts with B" or "B conflicts with A", you cannot have a system with both of them installed that will be supported by the developers of both A *and* B. Now, this beneficiary *may* be the packagers for a Linux distribution, but it may also be a larger Python distribution (ActiveState, EPD, etc), a web application developer, a desktop application developer, a system integrator for a large-scale distributed system, or anyone else that combines and deploys an integrated set of packages (even those a developer installs on their personal workstation). It's up to the user to decide who they want to believe. Now, it may be that, for a given use case, the end user doesn't actually care about the potential conflict (e.g. they've done their own research and determined that the conflicting behaviour doesn't affect their system) - that's then a design decision in the installation tools as to whether or not they want to make it easy for users to override the metadata. In the Linux distro case, the installer *and* most of the metadata are largely provided by the same people, so yum/rpm/etc generally *don't* make it easy to install conflicting packages. Python installers are in a different situation though, so forced installs are likely to be an expected feature (in fact, I expect the more likely outcome given the status quo is that the default behaviour will be a warning at installation time with an option to request enforcement of "no conflicts"). Building integrated systems *is hard*. Pretending projects can't conflict just because they're both written in Python isn't sensible, and neither is it sensible to avoid warning users about the the potential for latent defects when particular packages are used in combination. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Dec 8, 2012 at 5:06 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
But that *precisely contradicts* what you said in your previous email:
It's to allow a project to say *they don't support* installing in parallel with another package.
Just because A doesn't support being installed next to B, doesn't mean B doesn't support being installed next to A. B might work just fine with A installed, and even be explicitly supported by the author of B. Why should the author of A get to decide what happens to B? Just because I trust A about A, doesn't mean I should have to trust them about B. Look, I really don't care about the individual fields' definitions that much. I care about only one thing: A shouldn't get to (de facto) dictate what happens to B. If you *really* want the behavior to be symmetrical, then it should *only* be symmetrical if both A and B *agree* they are in conflict. (i.e., both refer to the other in their conflict fields). Otherwise, it should only be a warning. There are tons of other things that I could argue here about the positions you've laid out. But all I *really* care about is that we not define fields in such a way as to permit or encourage inter-package warfare -- intentional or not. Solutions acceptable to me include (in no particular order): * Make declarations affect only the declarer (as with Obsoleted-By) * Make declarations only warn users, not block installation or result in uninstallation * Have no automated action at all, and document them as intended for downstream repackagers only * Toss the field entirely * Make the field include a context (e.g. a distro name), so that only tools explicitly told you're operating in that context pay attention * Use the new metadata extension vocabularies to define hints for specific downstream packaging tools and systems * Replace "conflicts" with a specification of resources actually used by the project, so that such conflicts can be automatically detected without needing to target a specific project And there are probably others I haven't thought of yet. If you can be clearer about what it is you want from the Conflicts field *other* than just wanting it to stay as is (or perhaps *why* you would like to have the Python infrastructure side with project A over project B, irrespective of which project is A and which one is B), then perhaps I can come up with others.

On 2012-12-08 20:18, PJ Eby wrote:
[snip] If package A says that it conflicts with package B, it may or may not be symmetrical, because it's possible that package B has been updated since the author of package A discovered the conflict, so it's important that the user is told which package is complaining about the conflict, the one that is being installed or the one that is already installed. It may also be helpful if the package that includes the "Conflicts" declaration specifies which version of the other package it was last tested against in case there is a more recent version of the other package that does not cause the conflict, or, indeed, that there's a more recent version of the package that includes the "Conflicts" declaration that does not cause the conflict.

On 09/12/12 08:14, MRAB wrote:
I must admit than in reading this thread, I'm having a bit of trouble understanding why merely *installing* packages should lead to conflicts. Assuming that two software packages Spam and Ham install into directories Spam and Ham, how can merely having them installed side-by-side lead to a conflict? I can see how running or importing Spam and Ham together might lead to problems. And I can see that if package Spam wants to install into directory Ham, that would be bad. But who does that? Have I just demonstrated my naivety when it comes to packaging? Under what circumstances would two well-behaved packages with different names conflict? -- Steven

On Sun, Dec 9, 2012 at 12:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:
If two packages Spam and Ham both define a module Jam, then the one that gets loaded will depend on the search path. That would be one form of conflict. ChrisA

On 09/12/12 12:32, Chris Angelico wrote:
import Spam.Jam import Ham.Jam What am I missing? Why would a software package called "Spam" install a top-level module called "Jam" rather than "Spam"? Isn't the whole point of Python packages to solve this namespace problem? -- Steven

On Saturday, December 8, 2012 at 9:11 PM, Steven D'Aprano wrote:
Conflicts doesn't really solve file based conflicts as PJ Elby has pointed out tools need to detect that circumstance already. But to answer this question no, there is no required mapping between Project names (what your thing is called on PyPI) and python package names (what you import). Something named Spam on PyPI could provide multiple python packages, named whatever it was they wanted to be named.

On Sun, Dec 9, 2012 at 1:11 PM, Steven D'Aprano <steve@pearwood.info> wrote:
That would require/demand that the software package MUST define a module with its own name, and MUST NOT define any other top-level modules, and also that package names MUST be unique. (RFC 2119 keywords.) That would work, as long as those restrictions are acceptable. ChrisA

On Sun, Dec 09, 2012 at 01:51:09PM +1100, Chris Angelico wrote:
/me notes that setuptools itself is an example of a package that violates this rule )setuptools and pkg_resources). No objections to "That would work, as long as those restrictions are acceptable."... that seems to sum up where we're at. -Toshio

On 2012-12-09 01:15, Steven D'Aprano wrote:
[snip] Personally speaking, I was thinking more about possible problems at runtime due to functional conflicts, but it could apply to any (undefined) conflict.

On Sat, Dec 8, 2012 at 10:22 PM, MRAB <python@mrabarnett.plus.com> wrote:
If it's for a runtime functional conflict, there's no need for installation tools to worry about it, except perhaps in the case where a single project C depends on *both* A and B, where A and B conflict with each other. Apart from that piece of information, there is no way to know that the code will ever even be imported at the same time. (And even then, it's just a hint of the possibility, not a guarantee.) Nick, OTOH, says that the purpose of the field is to declare that mere side-by-side installation invalidates developer support for the configuration. However, the widespread confusion (conflicts?) over what exactly the field is supposed to mean and when it should be used suggests that its charter is not nearly as clear as it should be. It seems perhaps it is suffering from the so-called "Illusion of Transparency", wherein everybody looks at it and thinks that it *obviously* means X, and only a fool could think otherwise... except that everyone has a *different* value of X in mind. That's why I keep asking for specific, concrete use cases. At this point, for the field to make any sense, there needs to be some better idea of what a "runtime" or "undefined" conflict is. Apart from file conflicts, has anybody identified a single PyPI package that would make use of this field? If so, what *is* that example, and what is the nature of the conflict? Do any of the distro folks know of a Python project tagged as conflicting with another for their distro, where the conflict does *not* involve any files in conflict? (And the conflict is not specific to the distro's packaging of that project and the project in conflict? i.e., that it would have actually been possible and/or meaningful for the upstream developer to have flagged the conflict in the project's metadata, given the proposed metadata standard?)

On Sun, Dec 9, 2012 at 3:48 PM, PJ Eby <pje@telecommunity.com> wrote:
The best current example I know of is whether or not a given package is gevent compatible. At the moment, you have to try it and see, or hope the project developers have a note somewhere saying whether or not it works. "Incompatible" might be a better field name than "Conflicts" for that use case, though. You've persuaded me that any installer based notification of runtime conflicts should at most be a warning (or even a separate query), since the user has so many options for dealing with it (including the typical case where the two components are simply never used in the same process). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 09, 2012 at 12:48:45AM -0500, PJ Eby wrote:
In Fedora we do work to avoid most types of Conflicts (backporting fixes, etc) but I can give some examples of where Conflivts could have been used in the past: In docutils prior to the latest release, certain portions of docutils was broken if pyxml was installed (since pyxml replaces certain stdlib xml.* functionaltiy). So older docutils versions could have had a Conflicts: PyXML. Nick has since provided a technique for docutils to use that loads from the stdlib first and only goes to PyXML if the functionality is not available there. Various libraries in web stacks have had bugs that prevent the propser functioning of the web framework at the top level. In case of major issues (security, unable to startup), these top level frameworks could use versioned Conflicts to prevent installation. For instance: TurboGears might have a Conflicts: CherryPy < 2.3.1 Note, though, that if parallel installable versions and selection of the proper versions from that work, then this type of Conflict wouldn't be necessary. Instead you'd have versioned Requires: instead. -Toshio

On Sun, Dec 9, 2012 at 6:18 AM, PJ Eby <pje@telecommunity.com> wrote:
If I'm installing both A *and* B, I want to know if *either* project doesn't support that configuration. The order in which they get installed should *not* have any impact on my finding out that I am using one of my dependencies in an unsupported way that may cause me unanticipated problems further down the line. The author of A *doesn't* get to decide what happens to B, *I* do. They're merely providing a heads up that they believe there are problems when using their project in conjunction with B. My options will be: - use them both anyway (e.g. perhaps after doing some research, I may find out the conflict relates solely to a feature of B that I'm not using, so I simply update my project documentation to say "do not use feature X from project B, as it conflicts with dependency A") - choose to continue using A, find another solution for B - choose to continue using B, find another solution for A As a concrete example, there are projects out there that are known not to work with gevent's socket monkeypatching, but people don't know that until they try it and it blows up in their face. I now agree that *enforcing* a conflicts field at install time in a Python installer doesn't make any sense, since the nature of Python means it will often be easy to sidestep any such issues once you're aware of their existence (e.g. by avoiding gevent's monkeypatching features and using threads to interact with the uncooperative synchronous library, or by splitting your application into multiple processes, some using gevent and others synchronous sockets). I also believe that *any* Conflicts declaration *should* be backed up with an explicit explanation and rationale for that conflict declaration in the project documentation. Making it impossible to document runtime conflicts in metadata doesn't make those conflicts go away - it just means they will continue to be documented in an ad hoc manner on project web sites (if they get documented at all), making the job of package curation unnecessarily more difficult (since there is no standard way to document runtime conflicts). Adding a metadata field doesn't make sure such known conflicts *will* be documented, but it least makes it possible. So, I still like the idea of including a Conflicts field, but think a few points should be made clear: - the Conflicts field would be for documenting other distributions which have known issues working together in the same process and thus constitute an unsupported configuration - this field would be aimed at package *users*, rather than at installation tools (although it would still be good if they installation tools supported scanning a set of packages for known conflicts) - any use of this field should be backed up with a more detailed explanation in the project documentation Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 9, 2012 at 12:54 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
This is probably moot now, but I didn't propose that installation order matter -- in both scenarios I described, you end up with a warning and A not installed, regardless of whether A or B were installed first.
The author of A *doesn't* get to decide what happens to B, *I* do.
The reason I said, "(de facto)", is because the default behavior of whatever the next big installation tool is, would be what most users would've gotten by default.
Here's the question, though: who's going to maintain that list? I can see gevent wanting to have a compatibility chart page in their docs, but it seems unlikely they'd want to block installation of non-gevent-compatible projects or vice versa. Similarly, I can't see why any of those other projects would want to block installation of gevent, or vice versa. That being said, I don't object to having the ability for either of them to do so: the utility of the field is *much* enhanced once its connection to installation tools is gone, since a wider variety of issues can be described without inconveniencing users.
Beyond that, I think a reference URL should be included *in the field itself*, e.g. to a bug report, support ticket, or other page that documents the incompatibility and will be updated as the situation changes. The actual usefulness of the field to anyone "downstream" seems greatly reduced if they have to go hunting for the information explaining the compatibility issue(s). This is a good example of what I meant about clear thinking on concrete use cases, vs. simply copying fields from distro tools. In the distro world, these kinds of fields reflect the *results* of research and decision-making about compatibility. Whereas, in our "upstream" world, the purpose of the fields is to provide downstream repackagers and integrators with the source materials for such research.
My concrete recommendation based on your comments, then, is: * The field should be called Known-Incompatibilities (to better clarify its purpose and avoid confusion with similarly-named installation-oriented metadata in other tools) * The field should be of the form (though not necessarily syntax): ProjectName==incompatible_version; info=url That is, each entry lists a project name and a specific version that is known to be incompatible, along with a (required) information URL. The URL should be for: * a page that is updated with any change in the situation * that will remain available indefinitely, and * describes the specific reason that particular project is considered incompatible, along with any available workarounds For minor issues, a bug report or support ticket is acceptable; otherwise, a long-lived documentation link should be used. In-page anchor links are acceptable. A simple link to either project's home page or main documentation page is *not* acceptable: the link must to be a part of the documentation that directly addresses the nature of the incompatibility. I'm not too picky about the version specification approach, though; the simplest thing is to only allow a single version to be named, but it also seems it could be reasonable to list one or more version ranges that appy, as long as they are not open-ended going forward. That is, saying versions 1.1-2.3 are incompatible is ok, but not "1.1 on". (Because the author of A is not in a position to declare on B's behalf that the incompatibility will *never* be fixable.) (I might be overthinking the versions, bit, though, since this is really just about warnings.) I would recommend that tools automatically provide the warning in cases where a project C depends on versions of A and B that are declared incompatible. In this case, while one cannot *prove* the incompatibility to be an issue, it is still a potential issue. (This is more of a package build-time issue, though, as with Replaced-By.) Speaking of Replaced-By, it probably makes sense to require a URL in the field there as well, but that URL can be an unchanging page such as an archived post to a mailing list or blog, announcing the project's renaming or obsolescence, and providing migration help or links thereto. I think it also should be a multi-valued field, just like Known-Incompatibilities. Recently, I came across a Python project "lepl" (a parser combinator library) that just declared its end-of-life, and actually recommended multiple alternatives, each of which would be more appropriate for some uses of a parsing library. (That is, there was no single "does everything" replacement for lepl's full feature set.) Finally, the PEP should document that the audience for both the Replaced-By and Known-Incompatibilities fields is developers and system integrators (such as distro teams). So they are designed to be processed by tools that *build* packages, rather than tools that *install* them. So, if you build a project that depends on something that's replaced, or a pair of things known to be incompatible, that's when you get warnings and such. Tools to check such things on installed projects are also ok, though to avoid unnecessary warnings, it's probably best to only list incompatibilities for co-dependents (and orphaned replaced projects) by default. That is, a checker should probably ignore replacements when there's an installed project depending on the replaced version, and ignore incompatibilities that aren't part of the same requirements subtree (and thus unlikely to be used together). Of course, having options to be more verbose is not an issue, and this isn't really something to legislate anyway -- it's just that listing *every* replaced project or potentially-incompatible pairing in even a moderately-sized installation is likely to be far more noise than signal.

PJ Eby writes:
+1 to "describing". A metadata format should not specify tool behavior, and should use behavior-neutral nomenclature. Rather, use cases that seem probable or perhaps wrong-headed should inform the design. Nevertheless, actual decisions about behavior should be left to the tool authors.
I agree with the meaning of the above paragraph, but would like to dissociate myself from the comparison implied by the expression "clear thinking". AFAICS, it's different assumptions about use cases that drives the difference in prescriptions here.

On Sun, Dec 9, 2012 at 8:48 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
What comparison is that? By "clear", I mean "free of prior assumptions". The assumptions that made the discussion difficult weren't just about the use cases themselves, but about the environments, tools, organizations, concepts, etc. surrounding those use cases. Indeed, even the assumption of what should *qualify* as a "use case" was a stumbling block on occasion. ;-) And by "thinking", I mean, "considering alternatives and consequences", as distinct from debating the merits of a specific position. Put together, the phrase "clear thinking on concrete use cases" means (at least to me), "dropping all preconceptions of the existing design and starting over from square one, to ask how best the problem may be solved, using specific examples as a guide rather than using generalities." Generalities not rooted in concrete examples have a way of leading to non-terminating discussions. ;-) Starting over a discussion in this fashion isn't easy, but the results are usually worth it. I appreciate Nick and Daniel's patience in particular.

PJ Eby writes:
By "clear", I mean "free of prior assumptions".
Ah, well, I guess I've just run into a personal limitation. I can't imagine thinking that is "free of prior assumptions". Not my own<wink/>, and not by others, either. So, unfortunately, I was left with the conventional opposition in thinking: "clear" vs. "muddy". That impression was only strengthened by the phrase "vs. simply copying fields from distro tools."
Sure, but ISTM that's the opposite of what you've actually been doing, at least in terms of contributing to my understanding. One obstacle to discussion you have contributed to overcoming in my thinking is the big generality that the packager (ie, the person writing the metadata) is in a position to recommend "good behavior" to the installation tool, vs. being in a position to point out "relevant considerations" for users and tools installing the packager's product. Until that generality is formulated and expressed, it's very difficult to see why the examples and particular solutions to use cases that various proponents have described fail to address some real problems. It was difficult for me to see, at first, what distinction was actually being made. Specifically, I thought that the question about "Obsoletes" vs. "Obsoleted-By" was about which package should be considered authoritative about obsolescence. That is a reasonable distinction for that particular discussion, but there is a deeper, and general, principle behind that. Namely, "metadata is descriptive, not prescriptive." Of course once one understands that principle, the names of the fields don't matter so much, but it is helpful for "naive" users of the metadata if the field names strongly connote description of the package rather than behavior of the tool.

On Mon, Dec 10, 2012 at 3:27 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I suppose I should have said, "free of *known* prior assumptions", since the trick to suspending assumptions is to find the ones you *have*. The deeper assumptions, alas, can usually only be found by clashing opinions with others... then stepping back and going, "wait... what does he/she believe that's *different* from what I believe, that allows them to have that differing opinion?" And then that's how you find out what it is that *you're* assuming, that you didn't know you were assuming. ;-) (Not to mention what the other person is.)
Right, but I started from a concrete scenario I wanted to avoid, which led me to question the assumption that those fields were actually useful. As soon as I began questioning *that* assumption and asking for use cases (2 years ago, in the last PEP 345 revision discussion), it became apparent to me that there was something seriously wrong with the conflicts and obsoletes fields, as they had almost no real utility as they were defined and understood at that point.
Unfortunately, it's a chicken-and-egg problem: until you know what assumptions are being made, you can't formulate them. It's an iterative process of exposing assumptions, until you succeed in actually communicating. ;-) Heck, even something as simple as my assumptions about what "clear thinking" meant and what I was trying to say has taken some back and forth to clarify. ;-)
Actually, the principle I was clinging to for *both* fields was not giving project authors authority over other people's projects. It's fine for metadata to be prescriptive (e.g. requirements), it's just that it should be prescriptive *only* for that project in isolation. (In the broader sense, it also applies to the distro situation: the project author doesn't really have authority over the distro, either, so it can only be a suggestion there, as well.)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/08/2012 05:06 AM, Nick Coghlan wrote:
Building such systems is *too hard* to deletgate to the maintainers of every Python distribution registered on the Cheeseshop: there is too much policy involved for the ha'penn'orth of mechanism we are discussing here (decentralized inter-project metadata) to support. Such metadata *cannot* be useful in the general sense, but only in the context of a "curated" collection of packages, where the *curator* (not the upstream package authors) makes the choices. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iEYEARECAAYFAlDDwioACgkQ+gerLs4ltQ4rOACghpN5x+k0w0Umn20AG1WOvYkq KQsAnibXQtbTnmbrPaMaVEfLH7W496lk =WAh9 -----END PGP SIGNATURE-----

On Sun, Dec 9, 2012 at 8:41 AM, Tres Seaver <tseaver@palladion.com> wrote:
The authors of major projects are often in a good position to know when they conflict with other high profile projects and thus can't be used reliably in the same system. Now, *most* of the time, if there's a genuine conflict between two Python packages, it's going to be at install time - two projects attempting to install the same file obviously can't coexist on a single system (distribute and setuptools, for example, conflict at this level - they both want to own the "setuptools" and "easy_install" names). However, Python has plenty of other global state too (the codec registry, the import system, monkeypatching), and there is potential for conflict over underlying OS level resources. So let's look at the case of collections of Python packages that *are* curated. Maybe I'm a Linux distro packager, looking to automate the conversion to distro packages. Maybe I'm a toolsmith for a large corporation trying to build a curated set of packages for internal use (clearly indicating to my internal users which ones don't play nicely with each other and thus shouldn't be used together in the same project). Regardless of the reason, I'm the curator for a collection of Python packages. How shall I express the conflicts I have identified? Shall I go invent my own metadata system? Shall I be forced to choose a particular platform-specific dependency management system? How shall upstream authors communicate to *me* the conflicts that they're already aware of? Or, hey, there's this nice shiny cross-platform dependency management system *right here*. Maybe they'll be nice enough to consider handling *my* use case as well, even if it's a use case *they* don't care about. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 7, 2012 at 10:46 PM, PJ Eby <pje@telecommunity.com> wrote:
Skipping over a lot of other replies between you and I because I think that we disagree on a lot but that's all moot if we agree here. I have no problems with Obsoletes, Conflicts, Requires, and Provides types of fields are marked informational. In fact, there are many cases where packages are overzealous in their use of Requires right now that cause distributions to patch the dependency information in the package metadata. -Toshio

On 10 December 2012 16:35, Toshio Kuratomi <a.badger@gmail.com> wrote:
Given the endless debate on these fields, and the fact that it pretty much all seems to be about what happens when tools enforce them, I'm +1 on this. Particularly as these fields were not the focus of this change to the spec in any case. Paul.

There you go. Obsoleted-By (optional) ::::::::::::::::::::::: Indicates that this project is no longer being developed. The named project provides a substitute or replacement. A version declaration may be supplied and must follow the rules described in `Version Specifiers`_. The most common use of this field will be in case a project name changes. Examples:: Name: BadName Obsoleted-By: AcceptableName Obsoleted-By: AcceptableName (>=4.0.0)

On Wed, Nov 21, 2012 at 2:04 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk>wrote:
Yes, I thought Daniel's rewording looked pretty reasonable on that front. However, the details of how an installer uses this information is really up to the installer developers and what their users expect/demand. It certainly isn't *practical* to do a full dependency analysis when PyPI doesn't provide the same kind of precalculated metadata that a yum repo does, but that's not something that should be spelled out in the distribution metadata PEP, any more than it is spelled out in the RPM format spec. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Nov 19, 2012 at 07:49:41PM -0500, Donald Stufft wrote:
I'm not sure this assertion about OS package managers is correct. I've only just read: http://www.python.org/dev/peps/pep-0426/#provides-dist-multiple-use but the rough rpm analogue seems to be the Provides: tag. Provides is given a string which is parsed into a name or a name and version like this: Provides: python Provides: python = 3.1.0 rpm has no way at package build time to tell that a particular name given in a provides in one package is the actual name of another package. At installtime, rpm keeps package names and provides names separately but in dependency comparisons either one can be used to satisfy a requirement. What that means is that when asking about information on a package with name "python", you'll get information about the python package with that name and not about anything else that Provides: "python". But if you are installing something that has a requirement on "python" either the package with the name python or any package that Provides: python can satisfy the requirement. Package managers with builtin dep solvers can be built on top of rpm. The one that I am familiar with is yum. Since yum is downloading the packages that are being fed into rpm, yum could choose to prefer the package name instead of things in Provides when it downloads. It doesn't, though. Just like the underlying rpm, it treats package names and names specificed through Provides: as equivalent. -Toshio

On Monday, November 19, 2012 at 8:35 PM, Toshio Kuratomi wrote:
Are you saying the RPM documentation is wrong? http://www.rpm.org/max-rpm/s1-rpm-inside-tags.html The provides tag is used to specify a *virtual package* that the packaged software makes available when it is installed. Normally, this tag would be used when different packages provide equivalent services. For example, any package that allows a user to read mail might provide the mail-reader virtual package. Another package that depends on a mail reader of some sort, could require the mail-reader virtual package. It would then install without dependency problems, if any one of several mail programs were installed. It pretty clearly states that it is not to be used for masquerading as a different package, which was my point. I wasn't making any claims about wether it was technically possible to do so or not, just what it's intended purpose was.

Look more closely at the docs for "Obsoletes" in RPM, not just those for "Provides". Being able to transparently replace an existing package with a renamed one that installs files with the same names is certainly part of the purpose/capabilities of the RPM dependency machinery (i.e. precisely the distribute vs setuptools situation). We may want to clarify the wording to ensure it is clear that the provision of the dist name (as posted on PyPI) is implied, though. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On Nov 20, 2012 11:45 AM, "Donald Stufft" <donald.stufft@gmail.com> wrote:
participants (20)
-
Andrew McNabb
-
Barry Warsaw
-
Chris Angelico
-
Daniel Holth
-
Donald Stufft
-
Glenn Linderman
-
Jim J. Jewett
-
Jim Jewett
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
PJ Eby
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Toshio Kuratomi
-
Tres Seaver
-
Vinay Sajip