Mailman 3 Upcoming changes to PEP 426/440 - Distutils-SIG

Upcoming changes to PEP 426/440

Nick Coghlan

June 29, 2013

6:14 p.m.

Donald has been continuing his data modelling work for Warehouse (aka PyPI 2.0: https://github.com/dstufft/warehouse) and found that the *_requires/*_may_require split for dependencies was significantly more painful to work with than I had expected. Accordingly, I'm making some adjustments to the way dependencies are defined to bring them more into line with the way the contributors field and contact metadata works: 1. The "*_may_require" fields are all going away (leaving only the "*_requires" fields) 2. The "*_requires" fields are becoming lists of "dependency specifier" mappings rather than strings 3. A dependency specifier is now a mapping with the following fields: * "install": the installation specifier for the dependency * "extra": as per the current PEP (for conditional dependencies) * "environment": as per the current PEP (for conditional dependencies) 4. The "install" subfield is compulsory, the other two are optional (as now, using either of the latter creates a "conditional dependency", while dependency declarations with only the "install" subfield are unconditional) 5. An installation specifier is what PEP 426 currently calls a dependency specifier: the "name [extras] (constraints)" format. They will get their own top level section (similar to the existing Extras and Environment markers sections) In addition to those changes, I'll also be making a change to the recommended handling of virtual distributions (again based on Donald's feedback): integration tools should NEVER look at the "Provides" listings on a public index server to satisfy unmet dependencies. Instead, virtual distributions should be defined in such a way that whoever first invents the specific virtual distribution name registers it on PyPI, using the dependency metadata to pull in a default implementation. That's the only way to manage virtual distributions on a public index server that isn't vulnerable to later hijacking simply by registering a distribution with that name. As part of documenting that, I'll probably give the notion of "Virtual distributions" their own top level section (these are distributions that don't have any code of their own - they just declare dependencies on other projects to make them easy to install as a group, or to define the default provider for a dependency that may be satisfied by any one of multiple distributions). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Show replies by date

Vinay Sajip

June 2013

2:44 a.m.

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

Donald has been continuing his data modelling work for Warehouse (aka PyPI 2.0: https://github.com/dstufft/warehouse) and found that the *_requires/*_may_require split for dependencies was significantly more painful to work with than I had expected.

Has there been any public discussion about this? I'm just curious about what the difficulties were.

...

1. The "*_may_require" fields are all going away (leaving only the "*_requires" fields)

2. The "*_requires" fields are becoming lists of "dependency specifier" mappings rather than strings

I'm wondering if this area could be simplified further. For example, can't we lose test_requires, meta_requires, build_requires and dev_requires just by stating that "test", "meta", "build" and "dev" are reserved extra names which don't need to be explicitly defined in "extras"? Then you get just one list of dependency specifiers, which can be readily filtered to provide what is currently provided by {test,meta,build,dev}_requires. It seems to lead to a very simple data model, as well as making the JSON schema more concise. Regards, Vinay Sajip

Nick Coghlan

3 a.m.

On 30 June 2013 17:44, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
Donald has been continuing his data modelling work for Warehouse (aka PyPI 2.0: https://github.com/dstufft/warehouse) and found that the *_requires/*_may_require split for dependencies was significantly more painful to work with than I had expected.

Has there been any public discussion about this? I'm just curious about what the difficulties were.

...
1. The "*_may_require" fields are all going away (leaving only the "*_requires" fields)

2. The "*_requires" fields are becoming lists of "dependency specifier" mappings rather than strings

I'm wondering if this area could be simplified further. For example, can't we lose test_requires, meta_requires, build_requires and dev_requires just by stating that "test", "meta", "build" and "dev" are reserved extra names which don't need to be explicitly defined in "extras"? Then you get just one list of dependency specifiers, which can be readily filtered to provide what is currently provided by {test,meta,build,dev}_requires. It seems to lead to a very simple data model, as well as making the JSON schema more concise.

No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that. The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Vinay Sajip

3:53 a.m.

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency mapping? It seems like that's how one would organise it at the RDBMS level, anyway. { "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" } Sorry if I'm being dense :-( Regards, Vinay Sajip

Nick Coghlan

6:21 a.m.

On 30 June 2013 18:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency mapping? It seems like that's how one would organise it at the RDBMS level, anyway.

{ "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" }

You certainly *could* define it like that, but no existing dependency system I'm aware of does it that way. If they allow for anything other than runtime dependencies in the first place, they define a different top level field: * setuptools has requires and install_requires * PEP 346 has Requires-Dist and Setup-Requires-Dist * RPM has Requires and BuildRequires * npm has dependencies and devDependencies The different kinds of semantic dependency make fundamentally different statements about the distributions being referenced. In particular, automated tools (especially PyPI) may place different constraints on the kind of specifier they allow in each field. PEP 426 explicitly *requires* that runtime dependencies be split between :meta: (which requires exact version specifiers) and :run: (which disallows them) so tools can issue appropriate warnings or errors when someone pins a specific version without explicitly stating their intent to monitor that dependency responsibly. That kind of difference in what's appropriate indicates they're only the "same thing" at a superficial syntactic level. I decided to merge the *_requires and *_may_require fields because they had syntactic differences that made it hard to do unified processing on them. That's what Donald noticed and pointed out to me off-list - the need for two distinct code paths just to deal with the fact that *_requires used strings while *_may_require used a mapping with a nested list, even when the natural model in the database for an unconditional dependency is the same as a conditional dependency with a NULL extra and environment field. Having them split as they currently are in the PEP is also directly responsible for the odd "must define extra, environment or both" constraint on the "*_may_require" field entries. Merging the fields eliminates all that complexity - the only required subfield is now "install", with the two conditional fields both optional. This means unconditional dependencies can be read and written using the same code path as conditional dependencies, since the conditional dependency code already needed to cope with the fact the "extra" or "environment" subfields might be missing. We don't get the same level of payoff by switching to a "kind" subfield, because all five dependency fields already use the same internal syntax. Whether you're keying off the top level field name, or keying off a "kind" subfield, the processing code will still be identical across all five kinds of dependency. As far as data modelling goes, Warehouse actually splits the different kinds of dependency as distinct ORM classes. This allows various useful details (like the descriptive names for each kind of dependency) to be handled directly in the data model, rather than needing to be tracked separately. While the two forms are functionally equivalent, I still prefer multiple top level fields, as I consider it both easier to document and more consistent with the approach used by other packaging systems. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Donald Stufft

11:47 a.m.

On Jun 30, 2013, at 7:21 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 30 June 2013 18:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency mapping? It seems like that's how one would organise it at the RDBMS level, anyway.

{ "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" }

We don't get the same level of payoff by switching to a "kind" subfield, because all five dependency fields already use the same internal syntax. Whether you're keying off the top level field name, or keying off a "kind" subfield, the processing code will still be identical across all five kinds of dependency.

As far as data modelling goes, Warehouse actually splits the different kinds of dependency as distinct ORM classes. This allows various useful details (like the descriptive names for each kind of dependency) to be handled directly in the data model, rather than needing to be tracked separately.

While the two forms are functionally equivalent, I still prefer multiple top level fields, as I consider it both easier to document and more consistent with the approach used by other packaging systems.

Using a kind subfield actually is harder and requires more work. There's no way around needing to process each conditional dependency and check if they match, however each "kind" is always going to be an all or nothing kind of deal. You either want to process all of the dependencies of a certain kind, or none of them. As it stands right now you can just unconditionally loop over each dependency in a run_requires. However if all there was was required, you'd need to look over the entire list and check the kind subfield. Even worse if the order you install a "kind" matters (e.g. need to install build_requires prior to install run_requires) you'll need to loop over the list multiple times. I pushed for this change for the same basic reason I'm against the change you're mentioning. I think in order to make things easier for processing the first thing a tool would do given a single unified list with a subfield of "kind", is split them into several variables (wether variables in their own right, or as keys in a dictionary). If the natural inclination is to split them, we should just split them up front and make things simpler. Similarly I felt that it was more natural for a tool to want to condense the *_requires and *_may_requires so that they could easily run it though a single codepath without needing conditionals scattered all over. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Daniel Holth

2:11 p.m.

Sun, Jun 30, 2013 at 12:47 PM, Donald Stufft <donald@stufft.io> wrote:

...

On Jun 30, 2013, at 7:21 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 30 June 2013 18:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency mapping? It seems like that's how one would organise it at the RDBMS level, anyway.

{ "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" }

We don't get the same level of payoff by switching to a "kind" subfield, because all five dependency fields already use the same internal syntax. Whether you're keying off the top level field name, or keying off a "kind" subfield, the processing code will still be identical across all five kinds of dependency.

As far as data modelling goes, Warehouse actually splits the different kinds of dependency as distinct ORM classes. This allows various useful details (like the descriptive names for each kind of dependency) to be handled directly in the data model, rather than needing to be tracked separately.

While the two forms are functionally equivalent, I still prefer multiple top level fields, as I consider it both easier to document and more consistent with the approach used by other packaging systems.

Using a kind subfield actually is harder and requires more work. There's no way around needing to process each conditional dependency and check if they match, however each "kind" is always going to be an all or nothing kind of deal. You either want to process all of the dependencies of a certain kind, or none of them. As it stands right now you can just unconditionally loop over each dependency in a run_requires. However if all there was was required, you'd need to look over the entire list and check the kind subfield. Even worse if the order you install a "kind" matters (e.g. need to install build_requires prior to install run_requires) you'll need to loop over the list multiple times.

It is the same amount of hard. The dependency resolution system would probably want to build an identical requirements[kind][extra] data structure once no matter what the input looked liked by looping over the very short list a single time, or perhaps expand the extra names to all start with ":run:" + extra_name etc. The most important thing would be to avoid having the actual installer code do anything different based on the category of dependency so that pip doesn't wind up with 5 different install commands.

...

I pushed for this change for the same basic reason I'm against the change you're mentioning. I think in order to make things easier for processing the first thing a tool would do given a single unified list with a subfield of "kind", is split them into several variables (wether variables in their own right, or as keys in a dictionary). If the natural inclination is to split them, we should just split them up front and make things simpler. Similarly I felt that it was more natural for a tool to want to condense the *_requires and *_may_requires so that they could easily run it though a single codepath without needing conditionals scattered all over.

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Vinay Sajip

3:26 p.m.

Donald Stufft <donald <at> stufft.io> writes:

...

...
While the two forms are functionally equivalent, I still prefer multiple top level fields, as I consider it both easier to document and more consistent with the approach used by other packaging systems.

...

Using a kind subfield actually is harder and requires more work.

I take Nick's points regarding his preference and the reasons for it, but I'm not convinced by "harder". Having actually implemented the processing logic in distlib, it didn't seem especially hard to me. We already have more "kinds" than the other dependency systems Nick mentioned: one advantage of a single list of the kind I described is that you could in theory add additional "kinds", for example ":doc:", without much impact. Not that I'm arguing for any such addition - I'm quite aware of YAGNI. However, the fact that the single-list offers such possibilities is satisfying (as in, it feels right), and having implemented the multiple-list solution in distlib already, it just struck me that a single list would be more elegant. Leaving ORMs aside, I'm pretty sure if I was just working at the RDBMS layer, this is how I would structure the dependencies - using a single "dependency" table. It's a matter of detail as to exactly how a tool would process the single list - I'm reasonably confident that a readable/understandable and sufficiently performant implementation is achievable using a single pass over the JSON, and furthermore, worrying about multiple passes at this stage feels like premature optimisation. Regards, Vinay Sajip

Donald Stufft

3:37 p.m.

On Jun 30, 2013, at 4:26 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
...
While the two forms are functionally equivalent, I still prefer multiple top level fields, as I consider it both easier to document and more consistent with the approach used by other packaging systems.

...
Using a kind subfield actually is harder and requires more work.

I take Nick's points regarding his preference and the reasons for it, but I'm not convinced by "harder". Having actually implemented the processing logic in distlib, it didn't seem especially hard to me. We already have more "kinds" than the other dependency systems Nick mentioned: one advantage of a single list of the kind I described is that you could in theory add additional "kinds", for example ":doc:", without much impact. Not that I'm arguing for any such addition - I'm quite aware of YAGNI. However, the fact that the single-list offers such possibilities is satisfying (as in, it feels right), and having implemented the multiple-list solution in distlib already, it just struck me that a single list would be more elegant. Leaving ORMs aside, I'm pretty sure if I was just working at the RDBMS layer, this is how I would structure the dependencies - using a single "dependency" table.

It's a matter of detail as to exactly how a tool would process the single list - I'm reasonably confident that a readable/understandable and sufficiently performant implementation is achievable using a single pass over the JSON, and furthermore, worrying about multiple passes at this stage feels like premature optimization.

I'm not worried about the speed or performance, I'm worried about how annoying it is to write a tool that processes it. I optimize for the API it presents. Given your suggestion the first thing I'd do is take the single list and split them into multiple lists for easier processing. It's the (IMO) natural thing to do given a single list like that given the phased nature of installation. If it feels natural to split it, then you might as well split it up front and save the implementers from needing to do so. You might disagree, and that's fine. But it's not a performance based argument. As to the other question of adding new things, you can add a new top level keyword just as easily too in a completely backwards compatible fashion. If I was nitpicking i'd stick it in "dependencies": {"runtime": […], "build": […]} just for organizations sake. But I'm perfectly happy with them being top level key words too.

...

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Vinay Sajip

3:47 p.m.

Donald Stufft <donald <at> stufft.io> writes:

...

I'm not worried about the speed or performance, I'm worried about how annoying it is to write a tool that processes it.

Quite possibly I misunderstood the thrust of your argument, but the point of distlib is to take care of those kinds of low-level details, rather than having multiple packaging tools reimplement that processing you refer to. Of course people are free to not use distlib, but if I'm wasting my time then it would be nice if people would tell me :-) Regards, Vinay Sajip

Donald Stufft

4 p.m.

On Jun 30, 2013, at 4:47 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
I'm not worried about the speed or performance, I'm worried about how annoying it is to write a tool that processes it.

Quite possibly I misunderstood the thrust of your argument, but the point of distlib is to take care of those kinds of low-level details, rather than having multiple packaging tools reimplement that processing you refer to. Of course people are free to not use distlib, but if I'm wasting my time then it would be nice if people would tell me :-)

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

I'm not knocking the work you've done on distlib at all :) I just don't think it's existence means we shouldn't worry about the user friendliness of the API if someone doesn't use it. Maybe you think a single list is more user-friendly. That's fine for you to think that, we can agree to disagree on that :) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Gabriel de Perthuis

5:39 p.m.

On Sun, 30 Jun 2013 21:21:54 +1000, Nick Coghlan wrote:

...

On 30 June 2013 18:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency mapping? It seems like that's how one would organise it at the RDBMS level, anyway.

{ "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" }

You certainly *could* define it like that, but no existing dependency system I'm aware of does it that way. If they allow for anything other than runtime dependencies in the first place, they define a different top level field:

* setuptools has requires and install_requires * PEP 346 has Requires-Dist and Setup-Requires-Dist * RPM has Requires and BuildRequires * npm has dependencies and devDependencies

At least for Debian, and probably RPM, source dependencies have a different field name because they are carried by a source package rather than a binary one. The nature of the dependencies isn't different, the required packages are binary in both cases. The cartesian product might be overkill. If someone elects to install development dependencies I don't see a point in picking and choosing. There's enough support noise when people fail to build from source, and while an author is knowledgeable and might conceive more than one way to set things up, publishing them would cause more trouble than it's worth. So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

Donald Stufft

5:46 p.m.

On Jun 30, 2013, at 6:39 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...

So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

"Well known names" is way more ambiguous than a top level field. It's easy to have minor variances across various packages, "test" vs "tests", "docs", "doc", "documentation". Both top level and "kind" share the fact that there is a limited number of allowed names, which makes it simple to validate that the same name is being used everywhere (because anything outside of those limited numbers are rejected). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Gabriel de Perthuis

5:51 p.m.

On Sun, 30 Jun 2013 18:46:51 -0400, Donald Stufft wrote:

...

On Jun 30, 2013, at 6:39 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

"Well known names" is way more ambiguous than a top level field. It's easy to have minor variances across various packages, "test" vs "tests", "docs", "doc", "documentation". Both top level and "kind" share the fact that there is a limited number of allowed names, which makes it simple to validate that the same name is being used everywhere (because anything outside of those limited numbers are rejected).

These well-known names would also have some tool support. Something like `pip install-dev` would be sufficient.

Donald Stufft

5:52 p.m.

On Jun 30, 2013, at 6:51 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...

On Sun, 30 Jun 2013 18:46:51 -0400, Donald Stufft wrote:

...
On Jun 30, 2013, at 6:39 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

"Well known names" is way more ambiguous than a top level field. It's easy to have minor variances across various packages, "test" vs "tests", "docs", "doc", "documentation". Both top level and "kind" share the fact that there is a limited number of allowed names, which makes it simple to validate that the same name is being used everywhere (because anything outside of those limited numbers are rejected).

These well-known names would also have some tool support. Something like `pip install-dev` would be sufficient.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

But when defining them, it's very easy to accidentally use "tests" instead of "test". ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Gabriel de Perthuis

5:58 p.m.

On Sun, 30 Jun 2013 18:52:46 -0400, Donald Stufft wrote:

...

On Jun 30, 2013, at 6:51 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
On Sun, 30 Jun 2013 18:46:51 -0400, Donald Stufft wrote:

...
On Jun 30, 2013, at 6:39 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

"Well known names" is way more ambiguous than a top level field. It's easy to have minor variances across various packages, "test" vs "tests", "docs", "doc", "documentation". Both top level and "kind" share the fact that there is a limited number of allowed names, which makes it simple to validate that the same name is being used everywhere (because anything outside of those limited numbers are rejected).

These well-known names would also have some tool support. Something like `pip install-dev` would be sufficient.

But when defining them, it's very easy to accidentally use "tests" instead of "test".

A lint tool can warn about these names, and a PyPI server could even block them for new-style packages.

Donald Stufft

6 p.m.

On Jun 30, 2013, at 6:58 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...

On Sun, 30 Jun 2013 18:52:46 -0400, Donald Stufft wrote:

...
On Jun 30, 2013, at 6:51 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
On Sun, 30 Jun 2013 18:46:51 -0400, Donald Stufft wrote:

...
On Jun 30, 2013, at 6:39 PM, Gabriel de Perthuis <g2p.code@gmail.com> wrote:

...
So it would prefer that dev and test be extras with well known names, so that dev, test, and any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

"Well known names" is way more ambiguous than a top level field. It's easy to have minor variances across various packages, "test" vs "tests", "docs", "doc", "documentation". Both top level and "kind" share the fact that there is a limited number of allowed names, which makes it simple to validate that the same name is being used everywhere (because anything outside of those limited numbers are rejected).

These well-known names would also have some tool support. Something like `pip install-dev` would be sufficient.

But when defining them, it's very easy to accidentally use "tests" instead of "test".

A lint tool can warn about these names, and a PyPI server could even block them for new-style packages.

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Or use a separate field (either the name, or the aforementioned "kind" field) and remove all ambiguity from the concept and remove the need to have a lint tool or guess what the person might mean. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

6:04 p.m.

On 1 Jul 2013 08:40, "Gabriel de Perthuis" <g2p.code@gmail.com> wrote:

...

On Sun, 30 Jun 2013 21:21:54 +1000, Nick Coghlan wrote:

...
On 30 June 2013 18:53, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
No, because the semantic dependencies form a Cartesian product with the extras. You can define :meta:, :run:, :test:, :build: and :dev: dependencies for any extra. So if, for example, you have a "localweb" extra, then you can define extra test dependencies for that.

The semantic specifiers determine *which sets of dependencies* you're interested in, while the explicit extras define optional subsets of those.

Isn't that the same as having an additional field in the dependency

mapping?

...

...
...
It seems like that's how one would organise it at the RDBMS level, anyway.

{ "install": "localweb-test-util [win] (>= 1.0)", "extra": "localweb", "environment": "sys_platform == 'win32'", "kind": ":test:" }

You certainly *could* define it like that, but no existing dependency system I'm aware of does it that way. If they allow for anything other than runtime dependencies in the first place, they define a different top level field:

* setuptools has requires and install_requires * PEP 346 has Requires-Dist and Setup-Requires-Dist * RPM has Requires and BuildRequires * npm has dependencies and devDependencies

At least for Debian, and probably RPM, source dependencies have a different field name because they are carried by a source package rather than a binary one. The nature of the dependencies isn't different, the required packages are binary in both cases.

The cartesian product might be overkill. If someone elects to install development dependencies I don't see a point in picking and choosing. There's enough support noise when people fail to build from source, and while an author is knowledgeable and might conceive more than one way to set things up, publishing them would cause more trouble than it's worth.

I've had to port stuff to build on s390s - it would have made my life much easier if the dependencies that were only needed for optional x86_64 specific C accelerators had been clearly marked, rather than my having to weed them out through trial and error. What you're talking about is a rationale for sensible defaults and helper commands in tools (and the PEP does go into that), but it's not a good reason to limit the expressiveness of the format itself.

...

So it would prefer that dev and test be extras with well known names, so that dev, test, an any other extras define dependencies with a minimum of ambiguity and without the need for a second level of qualifiers.

How would you express an optional dependency on Cython for optional C accelerators in such a system? The PEP is as it is because I think the payoff in expressiveness is worth the increase in complexity. Saying "You shouldn't want to describe such situations clearly and succinctly" is not a compelling argument. Cheers, Nick.

...

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Vinay Sajip

July 2013

3:51 a.m.

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

* "install": the installation specifier for the dependency * "extra": as per the current PEP (for conditional dependencies) * "environment": as per the current PEP (for conditional dependencies)

4. The "install" subfield is compulsory, the other two are optional (as now, using either of the latter creates a "conditional dependency", while dependency declarations with only the "install" subfield are unconditional)

5. An installation specifier is what PEP 426 currently calls a dependency specifier: the "name [extras] (constraints)" format. They will get their own top level section (similar to the existing Extras and Environment markers sections)

Is there a particular benefit of the install subfield being a single installation specifier, as opposed to a list of such specifiers? It's perhaps neither here nor there for machine-processed metadata, but I expect this metadata would have human readers too. Not using a list would lead to more verbose metadata. Regards, Vinay Sajip

Nick Coghlan

6 a.m.

On 4 Jul 2013 18:52, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
* "install": the installation specifier for the dependency * "extra": as per the current PEP (for conditional dependencies) * "environment": as per the current PEP (for conditional dependencies)

4. The "install" subfield is compulsory, the other two are optional (as now, using either of the latter creates a "conditional dependency", while dependency declarations with only the "install" subfield are unconditional)

5. An installation specifier is what PEP 426 currently calls a dependency specifier: the "name [extras] (constraints)" format. They will get their own top level section (similar to the existing Extras and Environment markers sections)

Is there a particular benefit of the install subfield being a single installation specifier, as opposed to a list of such specifiers? It's perhaps neither here nor there for machine-processed metadata, but I

expect

...

this metadata would have human readers too. Not using a list would lead to more verbose metadata.

Hmm, I guess as long as it's consistent, the only difference when processing is list.append vs list.extend. There's a little extra work when serialising to group like entries together, but I'm OK with that (and that would be a SHOULD rather than a MUST anyway). If I don't hear a good argument against it, I'll make that field a list.

...

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

6:35 a.m.

On Jul 4, 2013, at 7:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 4 Jul 2013 18:52, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
* "install": the installation specifier for the dependency * "extra": as per the current PEP (for conditional dependencies) * "environment": as per the current PEP (for conditional dependencies)

4. The "install" subfield is compulsory, the other two are optional (as now, using either of the latter creates a "conditional dependency", while dependency declarations with only the "install" subfield are unconditional)

5. An installation specifier is what PEP 426 currently calls a dependency specifier: the "name [extras] (constraints)" format. They will get their own top level section (similar to the existing Extras and Environment markers sections)

Is there a particular benefit of the install subfield being a single installation specifier, as opposed to a list of such specifiers? It's perhaps neither here nor there for machine-processed metadata, but I expect this metadata would have human readers too. Not using a list would lead to more verbose metadata.

Hmm, I guess as long as it's consistent, the only difference when processing is list.append vs list.extend.

There's a little extra work when serialising to group like entries together, but I'm OK with that (and that would be a SHOULD rather than a MUST anyway).

If I don't hear a good argument against it, I'll make that field a list.

...
Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

I would prefer a single entry. It makes the serialization format map to the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Nick Coghlan

7:26 a.m.

On 4 Jul 2013 21:35, "Donald Stufft" <donald@stufft.io> wrote:

...

On Jul 4, 2013, at 7:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 4 Jul 2013 18:52, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:

...
Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
* "install": the installation specifier for the dependency * "extra": as per the current PEP (for conditional dependencies) * "environment": as per the current PEP (for conditional

...

...
...
...
4. The "install" subfield is compulsory, the other two are optional (as now, using either of the latter creates a "conditional dependency", while dependency declarations with only the "install" subfield are unconditional)

5. An installation specifier is what PEP 426 currently calls a dependency specifier: the "name [extras] (constraints)" format. They will get their own top level section (similar to the existing Extras and Environment markers sections)

Is there a particular benefit of the install subfield being a single installation specifier, as opposed to a list of such specifiers? It's perhaps neither here nor there for machine-processed metadata, but I expect this metadata would have human readers too. Not using a list would lead to more verbose metadata.

Hmm, I guess as long as it's consistent, the only difference when

...

...
There's a little extra work when serialising to group like entries

together, but I'm OK with that (and that would be a SHOULD rather than a MUST anyway).

...
If I don't hear a good argument against it, I'll make that field a list.

...
Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

I would prefer a single entry. It makes the serialization format map to

dependencies) processing is list.append vs list.extend. the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting. The main benefit is that all the dependencies for an extra will typically be in one place. However, I briefly forgot the "machine readable" part again, and for that TOOWTDI is to have one entry per dependency. Merging common criteria would then be a UI thing with multiple ways to do it (e.g. whether to group by extra or environment first for conditional dependencies). If you allow a list instead, then you have the problem of offering two ways to say the same thing (all in one entry or split across multiple entries). So the install subfield will remain a single string in the data interchange format, even if tools choose to structure it differently in their UI. Note repeating the key names as well some subfield values doesn't bother me - that's what streaming compression is for. This is what happens when I don't write my rationale down, though - I forget why I did things a certain way :) Cheers, Nick.

...

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372

DCFA

...

Vinay Sajip

2:38 p.m.

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

The main benefit is that all the dependencies for an extra will typically be in one place. However, I briefly forgot the "machine readable" part again, and for that TOOWTDI is to have one entry per dependency.

One record per dependency is indeed the case at the RDBMS level, but there's no reason why that scheme needs to be slavishly copied over to the JSON. Let me try to illustrate this. I couldn't find any modelling code in Donald's public repo - I checked both branches and couldn't find it (Donald, please point me to it if it's on GitHub rather than just your local clone). So I knocked up a simple model (using SQLAlchemy, but the model is so simple that just about any ORM should do). The entities are Project, Release and Dependency. I've created a simple script, depmodel.py, along with two JSON files which have the relevant subset of the PEP 426 metadata for setuptools 0.7.7 and Pyramid 1.4.2. These are available at https://gist.github.com/vsajip/5929707 This code/data uses the older schema (run_requires / run_may_require, etc. and using 'dependencies' rather than 'install' as a key). This is the JSON which is supposed to be problematic, so I wanted to see what the problems might be. I couldn't find any, so I'm linking to the code here so that Donald/Nick can point out any misunderstanding on my part. The script allows importing the dependencies from JSON to RDBMS (34 lines for the import function) and also exporting from RDBMS to JSON (43 lines for the export function). I've used SQLite for the database. python depmodel.py -i setuptools-0.7.7.json will read the dependencies into SQLite, and python depmodel.py -e setuptools/0.7.7 will print the SQLite records as JSON. I understand that people might have particular preferences, but I can't see any technical reason why we couldn't have lists in the JSON. The import and export code looks pretty simple to me. What have I missed? Regards, Vinay Sajip

Daniel Holth

7:09 p.m.

On the plus side if we're arguing about something as banal as this, maybe we are almost done! On Thu, Jul 4, 2013 at 3:38 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
The main benefit is that all the dependencies for an extra will typically be in one place. However, I briefly forgot the "machine readable" part again, and for that TOOWTDI is to have one entry per dependency.

One record per dependency is indeed the case at the RDBMS level, but there's no reason why that scheme needs to be slavishly copied over to the JSON. Let me try to illustrate this.

I couldn't find any modelling code in Donald's public repo - I checked both branches and couldn't find it (Donald, please point me to it if it's on GitHub rather than just your local clone). So I knocked up a simple model (using SQLAlchemy, but the model is so simple that just about any ORM should do). The entities are Project, Release and Dependency. I've created a simple script, depmodel.py, along with two JSON files which have the relevant subset of the PEP 426 metadata for setuptools 0.7.7 and Pyramid 1.4.2. These are available at

https://gist.github.com/vsajip/5929707

This code/data uses the older schema (run_requires / run_may_require, etc. and using 'dependencies' rather than 'install' as a key). This is the JSON which is supposed to be problematic, so I wanted to see what the problems might be. I couldn't find any, so I'm linking to the code here so that Donald/Nick can point out any misunderstanding on my part.

The script allows importing the dependencies from JSON to RDBMS (34 lines for the import function) and also exporting from RDBMS to JSON (43 lines for the export function). I've used SQLite for the database.

python depmodel.py -i setuptools-0.7.7.json

will read the dependencies into SQLite, and

python depmodel.py -e setuptools/0.7.7

will print the SQLite records as JSON.

I understand that people might have particular preferences, but I can't see any technical reason why we couldn't have lists in the JSON. The import and export code looks pretty simple to me. What have I missed?

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Vinay Sajip

7:24 p.m.

Daniel Holth <dholth <at> gmail.com> writes:

...

On the plus side if we're arguing about something as banal as this, maybe we are almost done!

I don't exactly see it as an argument - it's just a discussion (although of course we "argue" for our point of view). I don't think we're done by a long chalk, but as I see it, we might as well polish the various pieces of the puzzle as best we can as we go along. Code is more malleable than the spec, so we should try to get that to be as good as we reasonably can. Regards, Vinay Sajip

Daniel Holth

7:29 p.m.

On Thu, Jul 4, 2013 at 8:24 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Daniel Holth <dholth <at> gmail.com> writes:

...
On the plus side if we're arguing about something as banal as this, maybe we are almost done!

I don't exactly see it as an argument - it's just a discussion (although of course we "argue" for our point of view).

I don't think we're done by a long chalk, but as I see it, we might as well polish the various pieces of the puzzle as best we can as we go along. Code is more malleable than the spec, so we should try to get that to be as good as we reasonably can.

... certainly not with the whole project, just with this particular piece of metadata ...

Vinay Sajip

7:31 a.m.

Donald Stufft <donald <at> stufft.io> writes:

...

I would prefer a single entry. It makes the serialization format map to the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting.

It's a question of { "install": ["a", "b", "c"] } versus { "install": "a" }, { "install": "b" }, { "install": "c" } and I can't see why you think the latter is in any way better. IMO implementation details (such as "it's easier for the Django ORM to map it") should not take precedence over other considerations of readability/simplicity. In any case, I can't see why there would be any particular modelling problem with the scheme I've suggested. Is the modelling work you're doing public? I had a quick look at your warehouse repo (github.com/dstufft/warehouse) and I don't see any models beyond User and Email. Is that the correct location? I'd be happy to take a closer look to get a better understanding of what modelling problem you're seeing/foreseeing. FYI the metadata that I'm maintaining on red-dove.com is stored in a SQL database. While my SQL schema is not yet fully aligned with the PEP (as it's WIP), I don't see any modelling problem between an RDBMS backend and any of the JSON formats which have been published in the various revisions of the PEP. Some more detail would help :-) Regards, Vinay Sajip

Donald Stufft

7:37 a.m.

On Jul 4, 2013, at 8:31 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
I would prefer a single entry. It makes the serialization format map to the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting.

It's a question of

{ "install": ["a", "b", "c"] }

versus

{ "install": "a" }, { "install": "b" }, { "install": "c" }

and I can't see why you think the latter is in any way better. IMO implementation details (such as "it's easier for the Django ORM to map it") should not take precedence over other considerations of readability/simplicity. In any case, I can't see why there would be any particular modelling problem with the scheme I've suggested.

It's not that it's easier for the Django ORM to map it, it's just a simpler structure all together. It goes from a single relation to what's essentially a M2M with extra data on the intermediate table. I think it's better because there's less moving parts and this is designed for machines. Human readability is a nice to have but not hardly a requirement. Simpler, not impossible vs impossible ;)

...

Is the modelling work you're doing public? I had a quick look at your warehouse repo (github.com/dstufft/warehouse) and I don't see any models beyond User and Email. Is that the correct location? I'd be happy to take a closer look to get a better understanding of what modelling problem you're seeing/foreseeing.

And yea it's in that repo. Still in a branch though as I haven't finished it.

...

FYI the metadata that I'm maintaining on red-dove.com is stored in a SQL database. While my SQL schema is not yet fully aligned with the PEP (as it's WIP), I don't see any modelling problem between an RDBMS backend and any of the JSON formats which have been published in the various revisions of the PEP. Some more detail would help :-)

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Daniel Holth

8:23 a.m.

I also prefer the list install : [] Have you played with Postgresql's JSON support :-) On Thu, Jul 4, 2013 at 8:37 AM, Donald Stufft <donald@stufft.io> wrote:

...

On Jul 4, 2013, at 8:31 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
Donald Stufft <donald <at> stufft.io> writes:

...
I would prefer a single entry. It makes the serialization format map to the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting.

It's a question of

{ "install": ["a", "b", "c"] }

versus

{ "install": "a" }, { "install": "b" }, { "install": "c" }

and I can't see why you think the latter is in any way better. IMO implementation details (such as "it's easier for the Django ORM to map it") should not take precedence over other considerations of readability/simplicity. In any case, I can't see why there would be any particular modelling problem with the scheme I've suggested.

It's not that it's easier for the Django ORM to map it, it's just a simpler structure all together. It goes from a single relation to what's essentially a M2M with extra data on the intermediate table. I think it's better because there's less moving parts and this is designed for machines. Human readability is a nice to have but not hardly a requirement.

Simpler, not impossible vs impossible ;)

...
Is the modelling work you're doing public? I had a quick look at your warehouse repo (github.com/dstufft/warehouse) and I don't see any models beyond User and Email. Is that the correct location? I'd be happy to take a closer look to get a better understanding of what modelling problem you're seeing/foreseeing.

And yea it's in that repo. Still in a branch though as I haven't finished it.

...
FYI the metadata that I'm maintaining on red-dove.com is stored in a SQL database. While my SQL schema is not yet fully aligned with the PEP (as it's WIP), I don't see any modelling problem between an RDBMS backend and any of the JSON formats which have been published in the various revisions of the PEP. Some more detail would help :-)

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

8:29 a.m.

Yea. It's slow and requires invoking plv8 to do much of anything useful ;) On Jul 4, 2013, at 9:23 AM, Daniel Holth <dholth@gmail.com> wrote:

...

Have you played with Postgresql's JSON support :-)

Daniel Holth

8:34 a.m.

If you don't waste your time enforcing the uniqueness of (condition, extra) in the list of requirements then you can pretend install: is a single item if you want to... Wheel converts the flat Metadata 1.3 format to 2.0 draft easily with a defaultdict: https://bitbucket.org/dholth/wheel/src/fb7a900808f31f440049b89a656089b5f5502... On Thu, Jul 4, 2013 at 9:29 AM, Donald Stufft <donald@stufft.io> wrote:

...

Yea. It's slow and requires invoking plv8 to do much of anything useful ;)

On Jul 4, 2013, at 9:23 AM, Daniel Holth <dholth@gmail.com> wrote:

...
Have you played with Postgresql's JSON support :-)

Donald Stufft

8:36 a.m.

Yea I just spent significant effort cleaning up the database from a lack if enforced constraints. I will pass on not using them. On Jul 4, 2013, at 9:34 AM, Daniel Holth <dholth@gmail.com> wrote:

...

If you don't waste your time enforcing the uniqueness of (condition, extra) in the list of requirements then you can pretend install: is a single item if you want to...

Daniel Holth

8:45 a.m.

On Thu, Jul 4, 2013 at 9:36 AM, Donald Stufft <donald@stufft.io> wrote:

...

Yea I just spent significant effort cleaning up the database from a lack if enforced constraints. I will pass on not using them.

On Jul 4, 2013, at 9:34 AM, Daniel Holth <dholth@gmail.com> wrote:

...
If you don't waste your time enforcing the uniqueness of (condition, extra) in the list of requirements then you can pretend install: is a single item if you want to...

[ { 'extra':'foo', install:['thing1']}, {'extra:'foo', install:['thing2']} ]

Nick Coghlan

8:45 p.m.

On 4 Jul 2013 22:32, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:

...

Donald Stufft <donald <at> stufft.io> writes:

...
I would prefer a single entry. It makes the serialization format map to

the

...

...
modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting.

It's a question of

{ "install": ["a", "b", "c"] }

versus

{ "install": "a" }, { "install": "b" }, { "install": "c" }

and I can't see why you think the latter is in any way better.

The basic problem with the list form is that allowing two representations for the same metadata makes for extra complexity we don't really want. It means we have to decide if the decomposed version (3 separate entries with one item in each install list) is still legal. What I will do is draft PEP text for the list version that explicitly declares the decomposed form non-compliant with the spec. If I think the extra complexity looks tolerable, I'll switch it over. Cheers, Nick.

...

IMO implementation details (such as "it's easier for the Django ORM to map it") should not take precedence over other considerations of readability/simplicity. In any case, I can't see why there would be any particular modelling problem with the scheme I've suggested.

Is the modelling work you're doing public? I had a quick look at your warehouse repo (github.com/dstufft/warehouse) and I don't see any models beyond User and Email. Is that the correct location? I'd be happy to take a closer look to get a better understanding of what modelling problem you're seeing/foreseeing.

FYI the metadata that I'm maintaining on red-dove.com is stored in a SQL database. While my SQL schema is not yet fully aligned with the PEP (as it's WIP), I don't see any modelling problem between an RDBMS backend and any of the JSON formats which have been published in the various revisions of the PEP. Some more detail would help :-)

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

8:47 p.m.

On Jul 4, 2013, at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

What I will do is draft PEP text for the list version that explicitly declares the decomposed form non-compliant with the spec. If I think the extra complexity looks tolerable, I'll switch it over.

What is the "decomposed form" ? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Daniel Holth

8:50 p.m.

I don't think you can get around the complexity. Consider: {extra:'foo', condition:'platform == win32', install=[]} {extra:'foo', condition:'platform == linux', install=[]} They have to be flattened into a single list of all the 'foo' extras that are installable in the current environment anyway. It's exactly the same work you might try to avoid by worrying about whether install is a list. On Thu, Jul 4, 2013 at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 4 Jul 2013 22:32, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:

...
Donald Stufft <donald <at> stufft.io> writes:

...
I would prefer a single entry. It makes the serialization format map to the modeling simpler, and I think it's simpler for humans too. I don't see much benefit to making it a list except arbitrarily adding another level of nesting.

It's a question of

{ "install": ["a", "b", "c"] }

versus

{ "install": "a" }, { "install": "b" }, { "install": "c" }

and I can't see why you think the latter is in any way better.

The basic problem with the list form is that allowing two representations for the same metadata makes for extra complexity we don't really want. It means we have to decide if the decomposed version (3 separate entries with one item in each install list) is still legal.

What I will do is draft PEP text for the list version that explicitly declares the decomposed form non-compliant with the spec. If I think the extra complexity looks tolerable, I'll switch it over.

Cheers, Nick.

...
IMO implementation details (such as "it's easier for the Django ORM to map it") should not take precedence over other considerations of readability/simplicity. In any case, I can't see why there would be any particular modelling problem with the scheme I've suggested.

Is the modelling work you're doing public? I had a quick look at your warehouse repo (github.com/dstufft/warehouse) and I don't see any models beyond User and Email. Is that the correct location? I'd be happy to take a closer look to get a better understanding of what modelling problem you're seeing/foreseeing.

FYI the metadata that I'm maintaining on red-dove.com is stored in a SQL database. While my SQL schema is not yet fully aligned with the PEP (as it's WIP), I don't see any modelling problem between an RDBMS backend and any of the JSON formats which have been published in the various revisions of the PEP. Some more detail would help :-)

Regards,

Vinay Sajip

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Vinay Sajip

3:25 a.m.

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

The basic problem with the list form is that allowing two representations for the same metadata makes for extra complexity we don't really want. It means we have to decide if the decomposed version (3 separate entries with one item in each install list) is still legal.

I'm not sure how prescriptive we need to be. For example, posit metadata like: { "install": ["a", "b", "c"], "extra": "foo" }, { "install": ["d", "e", "f"], "extra": "foo" }, { "install": ["g"], "extra": "foo" } Even though there's no particular rationale for structuring it like this, the intention is clear: "a" .. "g" are dependencies when extra "foo" is specified. As long as the method by which these entries are processed is clear in the PEP, then it's not clear what's to be gained by being overly constraining. There are numerous ways in which dependency information can be represented which are not worth the effort to canonicalise. For example, the order in which extras or version constraints are declared in a dependency specifier: dist-name [foo,bar] (>= 1.0, < 2.0) and dist-name [bar,foo] (< 2.0, >= 1.0) are equivalent, but in any simplistic handling this would slip past e.g. database uniqueness constraints. More sophisticated handling (by modelling below the Dependency level) is possible, but whether it's worth it is debatable. Regards, Vinay Sajip

Daniel Holth

9:50 a.m.

On Fri, Jul 5, 2013 at 4:25 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
The basic problem with the list form is that allowing two representations for the same metadata makes for extra complexity we don't really want. It means we have to decide if the decomposed version (3 separate entries with one item in each install list) is still legal.

I'm not sure how prescriptive we need to be. For example, posit metadata like:

{ "install": ["a", "b", "c"], "extra": "foo" }, { "install": ["d", "e", "f"], "extra": "foo" }, { "install": ["g"], "extra": "foo" }

Even though there's no particular rationale for structuring it like this, the intention is clear: "a" .. "g" are dependencies when extra "foo" is specified. As long as the method by which these entries are processed is clear in the PEP, then it's not clear what's to be gained by being overly constraining.

There are numerous ways in which dependency information can be represented which are not worth the effort to canonicalise. For example, the order in which extras or version constraints are declared in a dependency specifier:

dist-name [foo,bar] (>= 1.0, < 2.0)

and

dist-name [bar,foo] (< 2.0, >= 1.0)

are equivalent, but in any simplistic handling this would slip past e.g. database uniqueness constraints. More sophisticated handling (by modelling below the Dependency level) is possible, but whether it's worth it is debatable.

Regards,

Vinay Sajip

I would really like to see one more level of nesting: requires : { run : [ ... ], test : [ ... ] } The parser and the specification will be simplified by putting all of the the requirements categories inside a uniform dict instead of having magic _-separated top level key names that have to be mapped to the "run", "meta", "test" category names. That way the top-level parser can just check: if metadata['requires'].keys() contains only the allowed values: parse_requirements(metadata['requires']) Then parse_requirements() works the same no matter how many requirements categories there are.

Vinay Sajip

10:23 a.m.

...

I would really like to see one more level of nesting:

requires : { run : [ ... ], test : [ ... ] }

I've already changed distlib's code several times as the spec has evolved, and would like not to see any more changes so that I can concentrate on some real work ;-) Seriously, what's currently there now works OK, and the code is fairly simple. I had suggested a variant with even less nesting - one single "requires" list with each entry as it is currently, but having an additional "kind" key with value ":run:", ":test:" etc. This has the merit that you can add additional kinds without major changes, while processing code can filter the list according to its needs at the time. This was shot down by Donald on the basis that it would make things too complicated, or something. Seems a simpler organisation, to me; any argument about additional time to process is unlikely to be a problem in practice, and there are no numbers to point to any performance problems. Currently, with pip, you have to download whole archives while doing dependency resolution, which takes of the order of *seconds* - *minutes* if you're working with Zope/Plone. Doing it in tens/hundreds of milliseconds is sheer luxury :-) Let's not keep on chopping and changing parts of the JSON schema unless there are actual progress stoppers or missing functional areas, as we recently identified with exports/scripts. It looks as if you and I are the only ones actually implementing this PEP at present, so let's work on interoperability between our implementations so that we can e.g. each build wheels that the other can install, and so on. Interoperability will help confirm that we haven't missed anything. AFAIK distlib tip is up to date with PEP 426/440 as they are today - someone please tell me if they find a counter-example to this assertion. Regards, Vinay Sajip

Daniel Holth

12:18 p.m.

On Fri, Jul 19, 2013 at 11:23 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...

...
I would really like to see one more level of nesting:

requires : { run : [ ... ], test : [ ... ] }

I've already changed distlib's code several times as the spec has evolved, and would like not to see any more changes so that I can concentrate on some real work ;-)

Seriously, what's currently there now works OK, and the code is fairly simple. I had suggested a variant with even less nesting - one single "requires" list with each entry as it is currently, but having an additional "kind" key with value ":run:", ":test:" etc. This has the merit that you can add additional kinds without major changes, while processing code can filter the list according to its needs at the time. This was shot down by Donald on the basis that it would make things too complicated, or something. Seems a simpler organisation, to me; any argument about additional time to process is unlikely to be a problem in practice, and there are no numbers to point to any performance problems. Currently, with pip, you have to download whole archives while doing dependency resolution, which takes of the order of *seconds* - *minutes* if you're working with Zope/Plone. Doing it in tens/hundreds of milliseconds is sheer luxury :-)

Either your proposal or mine would work out to be about the same. The advantage is that it helps people to conceptualize them as four instances of the same thing instead of four different kinds of things and it makes it easier to write a forwards-compatible implementation without looking for keys ending in _requires. It would also make the documentation significantly shorter.

Nick Coghlan

1:18 a.m.

On 20 July 2013 03:18, Daniel Holth <dholth@gmail.com> wrote:

...

On Fri, Jul 19, 2013 at 11:23 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:

...
...
I would really like to see one more level of nesting:

requires : { run : [ ... ], test : [ ... ] }

I've already changed distlib's code several times as the spec has evolved, and would like not to see any more changes so that I can concentrate on some real work ;-)

Seriously, what's currently there now works OK, and the code is fairly simple. I had suggested a variant with even less nesting - one single "requires" list with each entry as it is currently, but having an additional "kind" key with value ":run:", ":test:" etc. This has the merit that you can add additional kinds without major changes, while processing code can filter the list according to its needs at the time. This was shot down by Donald on the basis that it would make things too complicated, or something. Seems a simpler organisation, to me; any argument about additional time to process is unlikely to be a problem in practice, and there are no numbers to point to any performance problems. Currently, with pip, you have to download whole archives while doing dependency resolution, which takes of the order of *seconds* - *minutes* if you're working with Zope/Plone. Doing it in tens/hundreds of milliseconds is sheer luxury :-)

Either your proposal or mine would work out to be about the same. The advantage is that it helps people to conceptualize them as four instances of the same thing instead of four different kinds of things and it makes it easier to write a forwards-compatible implementation without looking for keys ending in _requires. It would also make the documentation significantly shorter.

Yeah, I'm mostly interested in being able to *explain* the new metadata easily. Previously, merging the requirements wasn't especially practical, due to the requires/may_require split. Now, though, it's possible to merge them and have the keys match exactly with the names used in ":run:", etc. However, I don't think it's enough of a win to duplicate the "requires" key at two different levels (inside the dependency specifiers and as a top level field), so I'm happy with sticking to "Flat is better than nested" on this one :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

4254

Age (days ago)

4275

Last active (days ago)

List overview

Download

40 comments

5 participants

participants (5)

Daniel Holth
Donald Stufft
Gabriel de Perthuis
Nick Coghlan
Vinay Sajip

Upcoming changes to PEP 426/440

Gabriel de Perthuis

Gabriel de Perthuis

Gabriel de Perthuis

tags

participants (5)