Mailman 3 Re: [Distutils] PEP-459 feedback from openSUSE packaging - Distutils-SIG

4 Feb 2014

      On Wednesday 05 February 2014 00:02:08 Nick Coghlan wrote:
...
On 4 February 2014 21:43, Sascha Peilicke <saschpe@gmx.de> wrote:
...
...
Hi guys,
a colleague of mine hinted me to Nick's plea for feedback on various PEPs
from other distros perspectives.
...
Thanks! Happy to see my linux.conf.au presentation bearing fruit

...
...
I will provide some general remarks first and latter comment at least
PEP-459. So for openSUSE (and SLES), we automate Python package
generation as much as possible. For that we parse the metadata as found
on PyPI and maul the source distribution (yeah, the tarball) for every
usable bit. The latter is necessary to properly install files such as
README, AUTHORS, LICENSE(.txt,...). We also use it to grep for '*test*'
files to decide if it'S worth generating a RPM %check section.
Some modules do add 'package_data'. That's usually helpful since it goes
straight into %python_sitelib. For the files mentioned above, we have
'data_files'. Not only is it seldomly used, it's almost universally done
wrong. Simply because every distro differs on where to put these files to.
...
PEP 426/459 will hopefully help with some of these, but I suspect only
up to a point - pypi.python.org is always going to maintain a lower
barrier to entry than the Linux distros that try to hammer their
packages into a more integrated whole.
I agree and there's much value in keeping that barrier as low as possible. 
Most of the times, setting "--install-data" to something meaningful (while 
issueing setup.py install) is sufficient.
...

...
...
A different cause of woe are install_requires / requires vs.
setup_requires
vs. test_require. Some people use 'requires', which is mostly
documentation
and lots of people put _everything_ in install_requires. From a
distribution> 
viewpoint, you have different sets of requirements:
 - build-time
+ optionally doc-requires
   + optionally test-requires
- run-time
So setup_requires / test_require can be used to generate semi-accurate
BuildRequires: $bla RPM spec tags. But as said, few people use them and
less do it correct. Maybe because 'setup_requires' doesn't specificy
build-time reqs but 'setuptools-invocation-time' reqs (which is sth.
different). Also, we simply use 'install_requires' as both 'Requires:'
(runtime) and
'BuildRequires:' (build-time). But that's a cludge. For example, projects
include 'Sphinx' in install_requires. What they meant is "if you want to
build docs, use Sphinx". What they specified is "you always need it".
Thankfully, the advent of pep allows us to check requirements.txt and
test-
requirements.txt. The latter are usually build-time (for the RPM %check
section). I guess I have to dig into the other PEPs first to see if this
really changed before being able to comment on that any further.
...
Yes, this aspect of the current system is a bit of a mess. One of the
things we're aiming for with the wheel format is to clarify that even
in the existing metadata, "install_requires" should refer to things
needed to create a wheel, while "requires" should refer to things
needed to actuall run the software after unpacking the wheel.
The proposed PEP 426 dependency tags are probably best summarised in
this section:
http://www.python.org/dev/peps/pep-0426/#mapping-dependencies-to-development
-and-distribution-activities
I meanwhile read that and it looks promising so far. I have a bit of a fear 
that people may be overwhelmed by the options they have.
...
Where a documentation dependency like Sphinx ends up would depend on
the project. If the project has no bundled documentation (e.g. online
docs only), then Sphinx would just be a "dev_requires" dependency.
However, if it *did* ship with generated documentation that needed to
be installed on the target system (e.g. man pages), then Sphinx would
instead be a "build_requires" dependency.
I expect many upstream projects will still need help from the distros
to get this right, but the ultimate aim for metadata 2.0 is to make it
easier for distro repackagers to submit such patches upstream and *get
them accepted as non-controversial changes*.
The good thing is that you can send pull requests to them with "do it that 
way" in the future rather than having to 'sed' around in setup.py files for 
the next 10 years.
...

...
...
In general, the other metadata is good enough (except 'license', see
below), 'name', 'version', 'upstream_url' and 'description' are used for
their respective RPM spec counterparts. 'long_description' is used for
'%description $PKG_NAME'. The tarball download URL is used as 'Source0:'.
All other metadata tags are ignored because we don't need them to build a
RPM.
...
From a name and version point of view, I'm not at all familiar with
the openSUSE policies, so I'd be interested in knowing if the
restrictions in http://www.python.org/dev/peps/pep-0426/#name would
also meet openSUSE naming guidelines.
That certainly help. The openSUSE naming policy is:

        python-$PYPI_UPSTREAM_NAME or
        python3-$PYPI_UPSTREAM_NAME

so we have packages like "python-zope.interface" or "python-Shed_Skin". I 
don't remember a module which had special characters (that RPM might frown 
upon) which we had to replace.
...
The version numbering PEP (http://www.python.org/dev/peps/pep-0440/)
is annoyingly complicated, but it unfortunately needs to be in order
to tolerate the diversity of versioning schems already in use on PyPI
Versioning is an arcane art. Of course we love semantic versioning (especially 
because RPM builds a lot of logic around), but if we discover sth crazy, we 
use this:

Version: 0.0.0+$WHATEVERNONSENSE

If $WHATEVERNONSE doesn't even increase alphabetically (needed for rpm -U to 
actually work), we put a date timestamp in front, sth. like:

Version 0.0.0+20140201.$WHATEVERNONSENSE

or just

Version 20140201.$WHATEVERNONSENSE

As long as the upstream version increases meaningfully, we're already content
...
...
The 'license' metadata tag is causing the most issues for us. A perceived
50% just put "GPL" in there. Which GPL version? GPL-only or actually
LGPL?. We have a legal crawler that tries to match the version from the
source code but often it's becomes a manual task or needs a check with
upstream. This tag is probably the least interesting for an upstream
developer but the most important one for any distro that has a corporate
legal entity somewhere in behind (I should say, sue-able entitity :-). So
with regards to PEP-459 specifically, I have specific recommendations for
the license tag. Instead of> 
        "This field SHOULD contain fewer than 512 characters and MUST
        contain
fewer
than 2048.
This field SHOULD NOT contain any line breaks."
I would propose:
        "This filed SHOULD contain a standardized license identifier as
published by spdx.org."
SPDX-sytle license identifiers are short (less than 20 chars) and can be
parsed automatically. They are meant to be unambiguous and cross-distro.
SPDX.org license tags are used extensively inside openSUSE and SLES and
(to my knowledge) for Fedora and Debian too. That would be the single
most interesting change I'd be interest in.
...
Oh, I hadn't seen SPDX before - very interesting. I'm wondering if it
may be a better fit for the PyPI Trove classifiers though - then it
wouldn't even need to wait for metadata 2.0, we could just add them to
the list of supported classifiers
(https://pypi.python.org/pypi?%3Aaction=list_classifiers) and projects
could start listing them in their current metadata.
Something like:
License :: SPDX :: <tag>
That is actually a brilliant idea to add it to trove classifiers. Thing is, 
sdpx provides both a long version and a short identifier. Since I consider 
trove classifiers to be human-readable, I'm thinking of using the long version 
for trove and the short one as a recommendation for the License: tag. There's 
a rationale behind. Some projects really have complex licensing issues 
(regardless what they claim). Just to give an example, the license for python-
psycopg2 we have is:

License: LGPL-3.0-with-openssl-exception and (LGPL-3.0-with-openssl-exception 
or ZPL-2.0)

BTW. this is not what upstream thinks they have, this is what our license 
crawler found out (and what was approved by our legal guys). part of spdx are 
the boolean operators you see to specify _either_ license X _or_ license Y 
_and_ licence Z. Happens more often than you can imagine. Therefore, the short 
spdx identifiers are much more consise rather than using trove classifiers. 
Currently, you simply can't express 'and' / 'or' with trove classifiers.

-- 
Mit freundlichen Grüßen,
Sascha Peilicke

Re: [Distutils] PEP-459 feedback from openSUSE packaging

Sascha Peilicke

tags

participants (1)