[Catalog-sig] trove - LGPL v3 not recognised?

Toshio Kuratomi a.badger at gmail.com
Tue Nov 15 23:28:50 CET 2011

On Tue, Nov 15, 2011 at 07:46:54AM +0100, "Martin v. Löwis" wrote:
> > 1) The other licenses which have versions attached to them do not place the
> >    version into a fourth level
> That's probably because nobody thought of it.
<nod>  But consistency in the classifiers is a nice thing.  If your goal as
a consumer of the data is really trying to isolate the version from the rest
of the license name, for instance, you already have to parse the version off
the end of some strings.  Making new classifiers that use another level for
a version means you have to maintain parsing of the version as a separate
field as an additional case.

Note that one of those other licenses is a GNU license:

License :: OSI Approved :: GNU Affero General Public License v3

> > 2) The utility of searching like that is limited.
> Why do you say that? You have full search capabilities either way. In
> fact, sub-classifiers improve the search capabilities.
I say this because in this case the different versions are simply ways of
naming different licenses.  The licenses have different terms and conditions
and in and of themselves are not even compatible.  The value of categorizing
the GPLv2 and GPLv3 license together is rather slim.  The value of
categorizing the MIT and new BSD licenses together would be higher than the
value of categorizing the GPLv2 and GPLv3 licenses together.  Wishing to
categorize the GPLv2 and GPLv3 together is only a superficial goal based on
the fact that they share the common substring "GNU General Public License"
in their name.

> >    If I'm searching for
> >    particular licenses, it's typically because I need to know whether the
> >    license is compatible with some other license.
> I'm not really sure what the common case for searching for a license is.
> One reason might also be that people want to know what the most popular
> licenses are, and would there want to aggregate GPL (any version).
Commonly people would not want to aggregate the GPL licenses here because
the GPL licenses are incompatible with each other and have different terms
and conditions.  If you want to know about popular licenses, you would want
to keep the GPLv2 and GPLv3 licenses separate in your count.  Or put another
way, if you are counting the GPLv2 and GPLv3 together, you likely aren't
really looking for a count of popular licenses.  You're likely looking for
a count of types of licenses.  These are some of the ways that people might
like to categorize licenses:

* Copyleft, non-copyleft, proprietary
* GPLv2-compatible, GPLv3-compatible, non-GPL compatible open source, proprietary
* Backed by a legal team, analyzed by a legal team, not rigorously analyzed

These are somewhat helped by having the version separated but they all have
issues as separating the version portion of the license name from the rest
is only an imperfect match for what you really want.  In the end, you have
to maintain lists of licenses that meet your categorizing criteria.  Having
the version as a separate field doesn't help as the textual portion of the
license name and the version portion have to be considered together to
determine which terms and condions need to be evaluated in your context.
The only thing that the raw name without version really brings to the table
is brand identification.

Here's a different example of this -- Let's say we were designing categories
for people who might want to classify software by which programming language
they were written in.  Would we want to group perl, python, and php together
because someone might want to search out popularity of languages according
to which begin with "p"?  Do we want to group Visual Basic and C#
together because both originated with Microsoft?  These groupings may fit
with what someone wants to accomplish but they're superficial groupings
based on things that have nothing to do with what the subject matter is
actually about.

Similarly, grouping the licenses based on the existence of a common
substring in the name is basing it on a superficial characteristic of the
license.  Instead, determine which attributes of the license are important
and you want to optimize for then optimize for that.

> But let's assume that people actually search for licenses to find
> software that is compatible with their needs (whether that is their
> license, their company policies, or their personal preferences).
> >    The GPL v2 and versoin 3 licenses are not compatible with each other.
> Then you search for either one by subclassifier (as you would for
> a flat classification). However, there are also cases where the
> license in question is compatible with both GPL versions, so you would
> want to search for GPL "any version".
This is lawyerly-debatable.  Since the GPLv2 and GPLv3 are incompatible and
since they are strong copylefts, the FSF's position has been that you cannot
license code in such a way that it can use both GPLv2  and GPLv3 licensed
code.  This has not been tested in court yet so lawyers continue to debate
the validity of this and the applicability in different situations (for
instance, is dynamic linkage different from shared?  Are scripting languages
different than compiled?) but anyone who doesn't want to risk having to go
to court over this needs to keep it in mind.

> >    With this in mind, it seemed like code which used the trove license
> >    categories would need to operate on each license+version independently,
> >    even if we grouped them that way in the categorization scheme.
> In the use cases you cited. I think there are also use cases where you
> would want to entire supercategory.
I could see *other* supercategories which could be of benefit but I don't see
that there is a case to be made for this particular separation.  A license's
version is a part of its name as the terms and conditions between versions
can change dramatically (and in the case of GPLv2 and GPLv3, LGPLv2
and LGPLv3; they did).

Here's some examples of other supercategories that could be considered:

::FSF :: GNU General Public License v3
:: Apache :: Apache Software License v2

This scheme would highlight the body that created a license.  Not all
licenses have a traceable or well known originator, though.  But for the
ones that do, this helps to answer the question of what legal team created
it or stands behind it/can explain the intent when it was drafed.

:: GNU Lesser General Public License v2 (LGPLv2) :: 2.0
:: GNU Lesser General Public License v2 (LGPLv2) :: 2.1
:: GNU Lesser General Public License v3 (LGPLv3) :: 3.0

This scheme would highlight when minor changes are made vs incompatible
changes.  This particular example is problematic, however, because the
particular change incorporated between v2.0 and v2.1 was a rename.  The 2.0
version is actually the GNU Library General Public License.  The concept is
problematic as deciding what's an incompatible change and what isn't is
debatable.  In the GPL context, version 2 and version 3 are incompatible
with each other so it's clear that they are different licenses.  On the
other hand, you have licenses like Old BSD (aka BSD with advertising) versus
the New BSD license.  The two are compatible with each other and the common
name for each is the same but externally, the new BSD license is compatible
with the GPL while the old one is not which is a major difference.

Talking about supercategories for licenses, though, points me in the
direction that really we're talking about wanting to add attributes to
describe the license itself, not attributes to describe the software.  That
seems out of scope for the trove categories being used on pypi.  (pypi
already sorta does this with the OSI Approved supercategory marking the
licenses as open source... but I'm wondering if that wasn't a mistake.
After all, the FSF maintains a similar list of licenses and the two lists
are not super/subsets of each other.) Limiting it to the license name (of
which the version is a part as the version + textual portion of the name
together reference the set of terms and conditions which make up the
license) might be better.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20111115/bc755f5a/attachment.pgp>

More information about the Catalog-SIG mailing list