PEP-459 feedback from openSUSE packaging

Hi guys, a colleague of mine hinted me to Nick's plea for feedback on various PEPs from other distros perspectives. I will provide some general remarks first and latter comment at least PEP-459. So for openSUSE (and SLES), we automate Python package generation as much as possible. For that we parse the metadata as found on PyPI and maul the source distribution (yeah, the tarball) for every usable bit. The latter is necessary to properly install files such as README, AUTHORS, LICENSE(.txt,...). We also use it to grep for '*test*' files to decide if it'S worth generating a RPM %check section. Some modules do add 'package_data'. That's usually helpful since it goes straight into %python_sitelib. For the files mentioned above, we have 'data_files'. Not only is it seldomly used, it's almost universally done wrong. Simply because every distro differs on where to put these files to. A different cause of woe are install_requires / requires vs. setup_requires vs. test_require. Some people use 'requires', which is mostly documentation and lots of people put _everything_ in install_requires. From a distribution viewpoint, you have different sets of requirements: - build-time + optionally doc-requires + optionally test-requires - run-time So setup_requires / test_require can be used to generate semi-accurate BuildRequires: $bla RPM spec tags. But as said, few people use them and less do it correct. Maybe because 'setup_requires' doesn't specificy build-time reqs but 'setuptools-invocation-time' reqs (which is sth. different). Also, we simply use 'install_requires' as both 'Requires:' (runtime) and 'BuildRequires:' (build-time). But that's a cludge. For example, projects include 'Sphinx' in install_requires. What they meant is "if you want to build docs, use Sphinx". What they specified is "you always need it". Thankfully, the advent of pep allows us to check requirements.txt and test- requirements.txt. The latter are usually build-time (for the RPM %check section). I guess I have to dig into the other PEPs first to see if this really changed before being able to comment on that any further. In general, the other metadata is good enough (except 'license', see below), 'name', 'version', 'upstream_url' and 'description' are used for their respective RPM spec counterparts. 'long_description' is used for '%description $PKG_NAME'. The tarball download URL is used as 'Source0:'. All other metadata tags are ignored because we don't need them to build a RPM. The 'license' metadata tag is causing the most issues for us. A perceived 50% just put "GPL" in there. Which GPL version? GPL-only or actually LGPL?. We have a legal crawler that tries to match the version from the source code but often it's becomes a manual task or needs a check with upstream. This tag is probably the least interesting for an upstream developer but the most important one for any distro that has a corporate legal entity somewhere in behind (I should say, sue-able entitity :-). So with regards to PEP-459 specifically, I have specific recommendations for the license tag. Instead of "This field SHOULD contain fewer than 512 characters and MUST contain fewer than 2048. This field SHOULD NOT contain any line breaks." I would propose: "This filed SHOULD contain a standardized license identifier as published by spdx.org." SPDX-sytle license identifiers are short (less than 20 chars) and can be parsed automatically. They are meant to be unambiguous and cross-distro. SPDX.org license tags are used extensively inside openSUSE and SLES and (to my knowledge) for Fedora and Debian too. That would be the single most interesting change I'd be interest in. I hope this was helpful, feel free to bug me if you need further detail. -- Mit freundlichen Grüßen, Sascha Peilicke

On Tue, Feb 04, 2014 at 12:43:40PM +0100, Sascha Peilicke wrote:
The 'license' metadata tag is causing the most issues for us. A perceived 50% just put "GPL" in there. Which GPL version? GPL-only or actually LGPL?.
FWIW you can sometimes get more detailed information from the classifiers: https://pypi.python.org/pypi?%3Aaction=list_classifiers E.g. setup(... licence='GPL', classifiers=[ ..., 'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)', ... ]) Marius Gedminas -- Unix gives you enough rope to shoot yourself in the foot.

On Tuesday 04 February 2014 15:42:45 Marius Gedminas wrote:
On Tue, Feb 04, 2014 at 12:43:40PM +0100, Sascha Peilicke wrote:
The 'license' metadata tag is causing the most issues for us. A perceived 50% just put "GPL" in there. Which GPL version? GPL-only or actually LGPL?. FWIW you can sometimes get more detailed information from the classifiers: https://pypi.python.org/pypi?%3Aaction=list_classifiers
E.g.
setup(... licence='GPL', classifiers=[ ..., 'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)', ... ])
True, that may help in manual review. But often classifiers aren't enough. Please see my more detailed reply elsewhere in this thread. -- Mit freundlichen Grüßen, Sascha Peilicke

On 4 February 2014 21:43, Sascha Peilicke <saschpe@gmx.de> wrote:
Hi guys,
a colleague of mine hinted me to Nick's plea for feedback on various PEPs from other distros perspectives.
Thanks! Happy to see my linux.conf.au presentation bearing fruit :)
I will provide some general remarks first and latter comment at least PEP-459. So for openSUSE (and SLES), we automate Python package generation as much as possible. For that we parse the metadata as found on PyPI and maul the source distribution (yeah, the tarball) for every usable bit. The latter is necessary to properly install files such as README, AUTHORS, LICENSE(.txt,...). We also use it to grep for '*test*' files to decide if it'S worth generating a RPM %check section.
Some modules do add 'package_data'. That's usually helpful since it goes straight into %python_sitelib. For the files mentioned above, we have 'data_files'. Not only is it seldomly used, it's almost universally done wrong. Simply because every distro differs on where to put these files to.
PEP 426/459 will hopefully help with some of these, but I suspect only up to a point - pypi.python.org is always going to maintain a lower barrier to entry than the Linux distros that try to hammer their packages into a more integrated whole.
A different cause of woe are install_requires / requires vs. setup_requires vs. test_require. Some people use 'requires', which is mostly documentation and lots of people put _everything_ in install_requires. From a distribution viewpoint, you have different sets of requirements:
- build-time + optionally doc-requires + optionally test-requires - run-time
So setup_requires / test_require can be used to generate semi-accurate BuildRequires: $bla RPM spec tags. But as said, few people use them and less do it correct. Maybe because 'setup_requires' doesn't specificy build-time reqs but 'setuptools-invocation-time' reqs (which is sth. different). Also, we simply use 'install_requires' as both 'Requires:' (runtime) and 'BuildRequires:' (build-time). But that's a cludge. For example, projects include 'Sphinx' in install_requires. What they meant is "if you want to build docs, use Sphinx". What they specified is "you always need it". Thankfully, the advent of pep allows us to check requirements.txt and test- requirements.txt. The latter are usually build-time (for the RPM %check section). I guess I have to dig into the other PEPs first to see if this really changed before being able to comment on that any further.
Yes, this aspect of the current system is a bit of a mess. One of the things we're aiming for with the wheel format is to clarify that even in the existing metadata, "install_requires" should refer to things needed to create a wheel, while "requires" should refer to things needed to actuall run the software after unpacking the wheel. The proposed PEP 426 dependency tags are probably best summarised in this section: http://www.python.org/dev/peps/pep-0426/#mapping-dependencies-to-development... Where a documentation dependency like Sphinx ends up would depend on the project. If the project has no bundled documentation (e.g. online docs only), then Sphinx would just be a "dev_requires" dependency. However, if it *did* ship with generated documentation that needed to be installed on the target system (e.g. man pages), then Sphinx would instead be a "build_requires" dependency. I expect many upstream projects will still need help from the distros to get this right, but the ultimate aim for metadata 2.0 is to make it easier for distro repackagers to submit such patches upstream and *get them accepted as non-controversial changes*.
In general, the other metadata is good enough (except 'license', see below), 'name', 'version', 'upstream_url' and 'description' are used for their respective RPM spec counterparts. 'long_description' is used for '%description $PKG_NAME'. The tarball download URL is used as 'Source0:'. All other metadata tags are ignored because we don't need them to build a RPM.
From a name and version point of view, I'm not at all familiar with the openSUSE policies, so I'd be interested in knowing if the restrictions in http://www.python.org/dev/peps/pep-0426/#name would also meet openSUSE naming guidelines.
The version numbering PEP (http://www.python.org/dev/peps/pep-0440/) is annoyingly complicated, but it unfortunately needs to be in order to tolerate the diversity of versioning schems already in use on PyPI :(
The 'license' metadata tag is causing the most issues for us. A perceived 50% just put "GPL" in there. Which GPL version? GPL-only or actually LGPL?. We have a legal crawler that tries to match the version from the source code but often it's becomes a manual task or needs a check with upstream. This tag is probably the least interesting for an upstream developer but the most important one for any distro that has a corporate legal entity somewhere in behind (I should say, sue-able entitity :-). So with regards to PEP-459 specifically, I have specific recommendations for the license tag. Instead of
"This field SHOULD contain fewer than 512 characters and MUST contain fewer than 2048.
This field SHOULD NOT contain any line breaks."
I would propose:
"This filed SHOULD contain a standardized license identifier as published by spdx.org."
SPDX-sytle license identifiers are short (less than 20 chars) and can be parsed automatically. They are meant to be unambiguous and cross-distro. SPDX.org license tags are used extensively inside openSUSE and SLES and (to my knowledge) for Fedora and Debian too. That would be the single most interesting change I'd be interest in.
Oh, I hadn't seen SPDX before - very interesting. I'm wondering if it may be a better fit for the PyPI Trove classifiers though - then it wouldn't even need to wait for metadata 2.0, we could just add them to the list of supported classifiers (https://pypi.python.org/pypi?%3Aaction=list_classifiers) and projects could start listing them in their current metadata. Something like: License :: SPDX :: <tag> Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Feb 4, 2014 at 9:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 February 2014 21:43, Sascha Peilicke <saschpe@gmx.de> wrote:
Hi guys,
a colleague of mine hinted me to Nick's plea for feedback on various PEPs from other distros perspectives.
Thanks! Happy to see my linux.conf.au presentation bearing fruit :)
I will provide some general remarks first and latter comment at least PEP-459. So for openSUSE (and SLES), we automate Python package generation as much as possible. For that we parse the metadata as found on PyPI and maul the source distribution (yeah, the tarball) for every usable bit. The latter is necessary to properly install files such as README, AUTHORS, LICENSE(.txt,...). We also use it to grep for '*test*' files to decide if it'S worth generating a RPM %check section.
Some modules do add 'package_data'. That's usually helpful since it goes straight into %python_sitelib. For the files mentioned above, we have 'data_files'. Not only is it seldomly used, it's almost universally done wrong. Simply because every distro differs on where to put these files to.
PEP 426/459 will hopefully help with some of these, but I suspect only up to a point - pypi.python.org is always going to maintain a lower barrier to entry than the Linux distros that try to hammer their packages into a more integrated whole.
A different cause of woe are install_requires / requires vs. setup_requires vs. test_require. Some people use 'requires', which is mostly documentation and lots of people put _everything_ in install_requires. From a distribution viewpoint, you have different sets of requirements:
- build-time + optionally doc-requires + optionally test-requires - run-time
So setup_requires / test_require can be used to generate semi-accurate BuildRequires: $bla RPM spec tags. But as said, few people use them and less do it correct. Maybe because 'setup_requires' doesn't specificy build-time reqs but 'setuptools-invocation-time' reqs (which is sth. different). Also, we simply use 'install_requires' as both 'Requires:' (runtime) and 'BuildRequires:' (build-time). But that's a cludge. For example, projects include 'Sphinx' in install_requires. What they meant is "if you want to build docs, use Sphinx". What they specified is "you always need it". Thankfully, the advent of pep allows us to check requirements.txt and test- requirements.txt. The latter are usually build-time (for the RPM %check section). I guess I have to dig into the other PEPs first to see if this really changed before being able to comment on that any further.
Yes, this aspect of the current system is a bit of a mess. One of the things we're aiming for with the wheel format is to clarify that even in the existing metadata, "install_requires" should refer to things needed to create a wheel, while "requires" should refer to things needed to actuall run the software after unpacking the wheel.
The proposed PEP 426 dependency tags are probably best summarised in this section: http://www.python.org/dev/peps/pep-0426/#mapping-dependencies-to-development...
Where a documentation dependency like Sphinx ends up would depend on the project. If the project has no bundled documentation (e.g. online docs only), then Sphinx would just be a "dev_requires" dependency. However, if it *did* ship with generated documentation that needed to be installed on the target system (e.g. man pages), then Sphinx would instead be a "build_requires" dependency.
I expect many upstream projects will still need help from the distros to get this right, but the ultimate aim for metadata 2.0 is to make it easier for distro repackagers to submit such patches upstream and *get them accepted as non-controversial changes*.
In general, the other metadata is good enough (except 'license', see below), 'name', 'version', 'upstream_url' and 'description' are used for their respective RPM spec counterparts. 'long_description' is used for '%description $PKG_NAME'. The tarball download URL is used as 'Source0:'. All other metadata tags are ignored because we don't need them to build a RPM.
From a name and version point of view, I'm not at all familiar with the openSUSE policies, so I'd be interested in knowing if the restrictions in http://www.python.org/dev/peps/pep-0426/#name would also meet openSUSE naming guidelines.
The version numbering PEP (http://www.python.org/dev/peps/pep-0440/) is annoyingly complicated, but it unfortunately needs to be in order to tolerate the diversity of versioning schems already in use on PyPI :(
The 'license' metadata tag is causing the most issues for us. A perceived 50% just put "GPL" in there. Which GPL version? GPL-only or actually LGPL?. We have a legal crawler that tries to match the version from the source code but often it's becomes a manual task or needs a check with upstream. This tag is probably the least interesting for an upstream developer but the most important one for any distro that has a corporate legal entity somewhere in behind (I should say, sue-able entitity :-). So with regards to PEP-459 specifically, I have specific recommendations for the license tag. Instead of
"This field SHOULD contain fewer than 512 characters and MUST contain fewer than 2048.
This field SHOULD NOT contain any line breaks."
I would propose:
"This filed SHOULD contain a standardized license identifier as published by spdx.org."
SPDX-sytle license identifiers are short (less than 20 chars) and can be parsed automatically. They are meant to be unambiguous and cross-distro. SPDX.org license tags are used extensively inside openSUSE and SLES and (to my knowledge) for Fedora and Debian too. That would be the single most interesting change I'd be interest in.
Oh, I hadn't seen SPDX before - very interesting. I'm wondering if it may be a better fit for the PyPI Trove classifiers though - then it wouldn't even need to wait for metadata 2.0, we could just add them to the list of supported classifiers (https://pypi.python.org/pypi?%3Aaction=list_classifiers) and projects could start listing them in their current metadata.
Something like:
License :: SPDX :: <tag>
Cheers, Nick.
Those SPDX are great. They could certainly go into either license or into a trove classifier, the difference being the trove classifiers are checked against a static list.

On 5 February 2014 00:05, Daniel Holth <dholth@gmail.com> wrote:
On Tue, Feb 4, 2014 at 9:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oh, I hadn't seen SPDX before - very interesting. I'm wondering if it may be a better fit for the PyPI Trove classifiers though - then it wouldn't even need to wait for metadata 2.0, we could just add them to the list of supported classifiers (https://pypi.python.org/pypi?%3Aaction=list_classifiers) and projects could start listing them in their current metadata.
Something like:
License :: SPDX :: <tag>
Cheers, Nick.
Those SPDX are great. They could certainly go into either license or into a trove classifier, the difference being the trove classifiers are checked against a static list.
The main advantage I can see to going the classifier route is that it means not having to wait for metadata 2.0 to promote them - folks can start using them as soon as they're registered on PyPI. It also avoids compatibility issues when attempting to convert the many current projects with unclear license terms to metadata 2.0, while still making it easy for distro repackagers to offer upstream patches or bug reports to request license clarifications. However, I do like the idea of having metadata 2.0 encourage the use of OSI approved SPDX tags in the license field. I just don't think we can upgrade that from a SHOULD to a MUST without breaking too many packages :( Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wednesday 05 February 2014 00:18:41 Nick Coghlan wrote:
On 5 February 2014 00:05, Daniel Holth <dholth@gmail.com> wrote:
On Tue, Feb 4, 2014 at 9:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oh, I hadn't seen SPDX before - very interesting. I'm wondering if it may be a better fit for the PyPI Trove classifiers though - then it wouldn't even need to wait for metadata 2.0, we could just add them to the list of supported classifiers (https://pypi.python.org/pypi?%3Aaction=list_classifiers) and projects could start listing them in their current metadata.
Something like: License :: SPDX :: <tag>
Cheers, Nick.
Those SPDX are great. They could certainly go into either license or into a trove classifier, the difference being the trove classifiers are checked against a static list.
The main advantage I can see to going the classifier route is that it means not having to wait for metadata 2.0 to promote them - folks can start using them as soon as they're registered on PyPI.
It also avoids compatibility issues when attempting to convert the many current projects with unclear license terms to metadata 2.0, while still making it easy for distro repackagers to offer upstream patches or bug reports to request license clarifications.
However, I do like the idea of having metadata 2.0 encourage the use of OSI approved SPDX tags in the license field. I just don't think we can upgrade that from a SHOULD to a MUST without breaking too many packages :(
I agree, it has to be a SHOULD of some sort. -- Mit freundlichen Grüßen, Sascha Peilicke
participants (4)
-
Daniel Holth
-
Marius Gedminas
-
Nick Coghlan
-
Sascha Peilicke