Mailman 3 Adding jsonschema to the standard library - Python-ideas

newer
Lossless bulletproof conversion to...

Adding jsonschema to the standard library

Demian Brecht

21 May 2015 21 May '15

5:29 a.m.

Disclaimer: I’m not the author of jsonschema (https://github.com/Julian/jsonschema), but as a user think that users of the standard library (and potentially areas of the standard library itself) could benefit from its addition into the standard library. I’ve been using jsonschema for the better part of a couple years now and have found it not only invaluable, but flexible around the variety of applications it has. Personally, I generally use it for HTTP response validation when dealing with RESTful APIs and system configuration input validation. For those not familiar with the package: RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04 Home: http://json-schema.org/ Proposed addition implementation: https://github.com/Julian/jsonschema Coles notes stats: Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) Heavily used by the community: Currently sees ~585k downloads per month according to PyPI I’ve reached out to the author to express my interest in authoring a PEP to have the module included to gauge his interest in assisting with maintenance as needed during the integration period (or following). I’d also be personally interested in supporting it as part of the stdlib as well. My question is: Is there any reason up front anyone can see that this addition wouldn’t fly, or are others interested in the addition as well? Thanks, Demian

Attachments:

signature.asc (application/pgp-signature — 842 bytes)

Show replies by date

Ludovic Gasc

21 May 21 May

5:46 a.m.

As a end-dev that uses your library for a small time, it's an useful tool. We're migrating quicker an Erlang application to Python with your library because the legacy application uses JSON schema.

...

From my point of view, validating I/O data is a common problem of most developers, however, it means that you have a lot of developers that have a strong opinion how to validate data ;-)

At least to me, it's a good idea to include this library in Python, even if you have plenty of libraries to do that with several approachs, for now, I didn't find a simpler approach that via JSON schemas. The bonus with that is that you can reuse your JSON schemas for migrations and also in your javascript source code. It isn't a silver bullet to resolve all validation corner cases, however enough powerful to resolve the most boring use cases. Ludovic Gasc (GMLudo) http://www.gmludo.eu/ On 21 May 2015 07:29, "Demian Brecht" wrote:

...

Disclaimer: I’m not the author of jsonschema ( https://github.com/Julian/jsonschema), but as a user think that users of the standard library (and potentially areas of the standard library itself) could benefit from its addition into the standard library.

I’ve been using jsonschema for the better part of a couple years now and have found it not only invaluable, but flexible around the variety of applications it has. Personally, I generally use it for HTTP response validation when dealing with RESTful APIs and system configuration input validation. For those not familiar with the package:

RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04 Home: http://json-schema.org/ Proposed addition implementation: https://github.com/Julian/jsonschema

Coles notes stats:

Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

I’ve reached out to the author to express my interest in authoring a PEP to have the module included to gauge his interest in assisting with maintenance as needed during the integration period (or following). I’d also be personally interested in supporting it as part of the stdlib as well.

My question is: Is there any reason up front anyone can see that this addition wouldn’t fly, or are others interested in the addition as well?

Thanks, Demian

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Demian Brecht

5:53 a.m.

...

On May 20, 2015, at 10:46 PM, Ludovic Gasc wrote: As a end-dev that uses your library for a small time, it's an useful tool.

...

Disclaimer: I’m not the author of jsonschema

Emphasis on /not/. I’m just another user of the library like you :) But cheers for the feedback!

Yury Selivanov

5:59 a.m.

On 2015-05-21 1:29 AM, Demian Brecht wrote: [..]

...

My question is: Is there any reason up front anyone can see that this addition wouldn’t fly, or are others interested in the addition as well?

I think we should wait at least until json-schema.org releases a final version of the spec. Thanks, Yury

Demian Brecht

6:18 a.m.

...

On May 20, 2015, at 10:59 PM, Yury Selivanov wrote: I think we should wait at least until json-schema.org releases a final version of the spec.

I’d thought about that as well, but here were the arguments that I could think of that led me to proposing this in the first place: The latest draft of the RFC expired Jan 31, 2013. I’d have to try to reach out to the author(s) to confirm, but I’d venture to say there likely isn’t much more effort being put into it. The library is in heavy use and is useful in practice in its current state. I think that in situations like this practicality of a module should come first and finalized spec second. There are numerous places in the library that deviate from specs in the name of practical use. I’m not advocating that shouldn’t be an exception as opposed to the rule, I’m just saying that there are multiple things to consider prior to simply squashing an inclusion because of RFC draft state.

Stephen J. Turnbull

8:04 a.m.

Demian Brecht writes:

...

The latest draft of the RFC expired Jan 31, 2013.

Actually, expiration is more than half a year fresher: August 4, 2013. But AFAICT none of the schema proposals were RFC track at all, let alone normative. They're just in support of various other JSON-related IETF work. Steve

Stephen J. Turnbull

7:52 a.m.

Yury Selivanov writes:

...

I think we should wait at least until json-schema.org releases a final version of the spec.

If you mean an RFC, there are all kinds of reasons, some important, some just tedious, why a perfectly good spec never gets released as an RFC. I agree that the fact that none of the IETF, W3C, or ECMA has released a formal spec yet needs discussion.

Stephen J. Turnbull

7:39 a.m.

Demian Brecht writes:

...

RFC draft: https://tools.ietf.org/html/draft-zyp-json-schema-04

I note that this draft, apparently written in Nov. 2011, expired almost two years ago with no update. OTOH, 4 other RFCs related to JSON (6901, 6902, 7386, 7396) have been published recently. (This kind of thing is common with RFCs; people get fed up with the process and just go off and do something that's "good enough" for them. But it does show they've given up on the process of getting a global standard at least for now.) Then in Oct 2012, Andy Newton wrote[1]: Schemas. There is no one standardized schema language for JSON, although several are presently in the works (including one by this author). The need for a JSON schema language is controversial—JSON is regarded by most as simple enough on its own. Indeed, there is no shortage of JSON-based interchange specification making due without schema formalism. and his independent proposal[2] (confusingly called "content rules") is current, expiring on June 5. (Note that there is no proposal currently being discussed by the IETF APPSAWG. Newton's proposal is independent, pending formation of a new charter for a JSON schema WG.)

...

My question is: Is there any reason up front anyone can see that this addition wouldn’t fly?

I would say that the evident controversy over which schema language will be standardized is a barrier, unless you can say that Newton's proposals have no support from the community or something like that. It's not a terribly high barrier in one sense (Python doesn't demand that modules be perfect in all ways), but you do have to address the perception of controversy, I think (at least to deny there really is any). A more substantive issue is that Appendix A of Newton's I-D certainly makes json-schema look "over the top" in verbosity of notation -- XML would be proud.<wink /> If that assessment is correct, the module could be considered un-Pythonic (see Zen #4, and although JSON content rules are not themselves JSON while JSON schema is valid JSON, see Zen #9). N.B. I'm not against this proposal, just answering your question. I did see that somebody named James Newton-King (aka newtonsoft.com) has an implementation of json-schema for .NET, and json-schema.org seems to be in active development, which are arguments in favor of your proposal. Footnotes: [1] http://www.internetsociety.org/articles/using-json-ietf-protocols [2] https://tools.ietf.org/html/draft-newton-json-content-rules-04

Paul Moore

7:57 a.m.

On 21 May 2015 at 06:29, Demian Brecht wrote:

...

Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

One key question that should be addressed as part of any proposal for inclusion into the stdlib. Would switching to having feature releases only when a new major Python version is released (with bugfixes at minor releases) be acceptable to the project? From the figures you quote, it sounds like there has been some rapid development, although things seem to have slowed down now, so maybe things are stable enough. Paul

Nick Coghlan

9:15 a.m.

On 21 May 2015 at 17:57, Paul Moore wrote:

...

On 21 May 2015 at 06:29, Demian Brecht wrote:

...
Has been publicly available for over a year: v0.1 released Jan 1, 2012, currently at 2.4.0 (released Sept 22, 2014) Heavily used by the community: Currently sees ~585k downloads per month according to PyPI

One key question that should be addressed as part of any proposal for inclusion into the stdlib. Would switching to having feature releases only when a new major Python version is released (with bugfixes at minor releases) be acceptable to the project? From the figures you quote, it sounds like there has been some rapid development, although things seem to have slowed down now, so maybe things are stable enough.

The other question to be answered these days is the value bundling offers over "pip install jsonschema" (or a platform specific equivalent). While it's still possible to meet that condition, it's harder now that we offer pip as a standard feature, especially since getting added to the standard library almost universally makes life more difficult for module maintainers if they're not already core developers. I'm not necessarily opposed to including JSON schema validation in general or jsonschema in particular (I've used it myself in the past and think it's a decent option if you want a bit more rigor in your data validation), but I'm also not sure how large an overlap there will be between "could benefit from using jsonschema", "has a spectacularly onerous package review process", and "can't already get jsonschema from an approved source". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Demian Brecht

22 May 22 May

4:39 p.m.

First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.

...

On May 21, 2015, at 2:15 AM, Nick Coghlan wrote:

The other question to be answered these days is the value bundling offers over "pip install jsonschema" (or a platform specific equivalent). While it's still possible to meet that condition, it's harder now that we offer pip as a standard feature, especially since getting added to the standard library almost universally makes life more difficult for module maintainers if they're not already core developers.

This is an interesting problem and a question that I’ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”. I see two problems with relying on pip and PyPI as an alternative to bundling: 1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice. 2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

Ian Cordasco

7:08 p.m.

On Fri, May 22, 2015 at 11:39 AM, Demian Brecht wrote:

...

First off, thanks all for the well thought out responses! Will try to touch on each point when I get a few spare cycles throughout the day.

...
On May 21, 2015, at 2:15 AM, Nick Coghlan wrote:

The other question to be answered these days is the value bundling offers over "pip install jsonschema" (or a platform specific equivalent). While it's still possible to meet that condition, it's harder now that we offer pip as a standard feature, especially since getting added to the standard library almost universally makes life more difficult for module maintainers if they're not already core developers.

This is an interesting problem and a question that I’ve had at the back of my mind as well. With the addition of pip, there is really no additional value /to those who already know about the package and what problem it solves/. In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”. I see two problems with relying on pip and PyPI as an alternative to bundling:

Counter-point: What library is the de facto standard of doing HTTP in Python? Requests is, of course. Discussion of its inclusion has happened several times and each time the decision is to not include it. The most recent such discussion was at the Language Summit at PyCon 2015 in Montreal. If you want to go by download count, then Requests should still be in the standard library but it just will not happen.

...

1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

That's not exactly true in every case. The only library that parses and emits YAML is PyYAML. It's both unmaintained, incomplete, and full of bugs. That said, it's the de facto standard and it's the only onw of its kind that I know of on PyPI. I would vehemently argue against its inclusion were it ever purposed.

...

2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

Counter-point, once you know you want to use JSON Schema looking for implementations in python yields Julian's implementation first. You said (paraphrasing) in your first email that jsonschema should only be excluded from the stdlib if people could bring up reasons against it. The standard library has grown in the past few releases but that doesn't mean it needs to grow every time. It also means it doesn't need to grow to include an implementation of every possible /thing/ that exists. Further, leaving it up to others to prove why it shouldn't be included isn't sufficient. You have to prove to the community why it MUST be included. Saying "Ah let's throw this thing in there anyway because why not" isn't valid. By that logic, I could nominate several libraries that I find useful in day-to-day work and the barrier to entry would be exactly as much energy as people who care about the standard library are willing to expend to keep the less than sultry candidates out. In this case, that /thing/ is JSON Schema. Last I checked, JSON Schema was a IETF Draft that was never accepted and a specification which expired. That means in a couple years, ostensibly after this was added to the stdlib, it could be made completely irrelevant and the time to fix it would be incredible. That would be far less of an issue if jsonschema were not included at all. Overall, I'm strongly against its inclusion. Not because the library isn't excellent. It is. I use it. I'm strongly against it for the reasons listed above.

Donald Stufft

7:23 p.m.

...

On May 22, 2015, at 3:08 PM, Ian Cordasco wrote:

...
1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

That's not exactly true in every case. The only library that parses and emits YAML is PyYAML. It's both unmaintained, incomplete, and full of bugs. That said, it's the de facto standard and it's the only onw of its kind that I know of on PyPI. I would vehemently argue against its inclusion were it ever purposed.

...
2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

Counter-point, once you know you want to use JSON Schema looking for implementations in python yields Julian's implementation first.

I think a future area of work is going to be on improving the ability for people who don't know what they want to find out that they want something and which thing they want on PyPI. I'm not entirely sure what this is going to look like but I think it's an important problem. It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense. This of course does not scale to every single problem domain or module on PyPI so we still need a more general solution. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Stephen J. Turnbull

23 May 23 May

2:59 a.m.

Donald Stufft writes:

...

I think a future area of work is going to be on improving the ability for people who don't know what they want to find out that they want something and which thing they want on PyPI. I'm not entirely sure what this is going to look like

...

but I think it's an important problem.

...

It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different problem. We need a decentralized approach to deal with the "people who use package X often would benefit from Y too, but don't know where to find Y or which implementation to use." IOW, there needs to be a way for X to recommend implementation Z (or implementations Z1 or Z2) of Y.

...

This of course does not scale to every single problem domain or module on PyPI so we still need a more general solution.

The only way we know to scale a web is to embed the solution in the nodes. Currently many packages know what they use internally (the install_requires field), but as far as I can see there's no way for a package X to recommend "related" packages Z to implement function Y in applications using X. Eg, the plethora of ORMs available, some of which work better with particular packages than others do. We could also recommend that package maintainers document such recommendations, preferably in a fairly standard place, in their package documentation. Even something like "I've successfully used Z to do Y in combination with this package" would often help a lot. If a maintainer (obvious extension: 3rd party recommendations and voting) wants to recommend other packages that work and play well with her package but aren't essential to its function, how about a dictionary mapping Trove classifiers to lists of recommended packages for that implmenentation?

Andrew Barnert

5:07 a.m.

On May 22, 2015, at 19:59, Stephen J. Turnbull wrote:

...

Donald Stufft writes:

...
I think a future area of work is going to be on improving the ability for people who don't know what they want to find out that they want something and which thing they want on PyPI. I'm not entirely sure what this is going to look like

+1

...
but I think it's an important problem.

+1

...
It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different problem. We need a decentralized approach to deal with the "people who use package X often would benefit from Y too, but don't know where to find Y or which implementation to use." IOW, there needs to be a way for X to recommend implementation Z (or implementations Z1 or Z2) of Y.

...
This of course does not scale to every single problem domain or module on PyPI so we still need a more general solution.

The only way we know to scale a web is to embed the solution in the nodes. Currently many packages know what they use internally (the install_requires field), but as far as I can see there's no way for a package X to recommend "related" packages Z to implement function Y in applications using X. Eg, the plethora of ORMs available, some of which work better with particular packages than others do.

We could also recommend that package maintainers document such recommendations, preferably in a fairly standard place, in their package documentation. Even something like "I've successfully used Z to do Y in combination with this package" would often help a lot.

If a maintainer (obvious extension: 3rd party recommendations and voting) wants to recommend other packages that work and play well with her package but aren't essential to its function, how about a dictionary mapping Trove classifiers to lists of recommended packages for that implmenentation?

This is a really cool idea, but it would help to have some specific examples. For example, BeautifulSoup can only use html5lib or lxml as optional HTML parsers, and lxml as an optional XML parser; nothing else will do any good. But it works well with any HTTP request engine, so any "global" recommendation is a good idea, so it should get the same list (say, requests, urllib3, grequests, pycurl) as any other project that wants to suggest an HTTP request engine. And as for scraper frameworks, that should look at the global recommendations, but restricted to the ones that use, or can use, BeautifulSoup. I'm not sure how to reasonably represent all three of those things in a node. Of course it's quite possible that I jumped right to a particularly hard example with unique problems that don't need to be solved in general, and really only the first one is necessary, in which case this is a much simpler problem...

Stephen J. Turnbull

6:55 a.m.

Andrew Barnert writes:

...

...
If a maintainer (obvious extension: 3rd party recommendations and voting) wants to recommend other packages that work and play well with her package but aren't essential to its function, how about a dictionary mapping Trove classifiers to lists of recommended packages for that implmenentation?

This is a really cool idea, but it would help to have some specific examples.

For example, BeautifulSoup can only use html5lib or lxml as optional HTML parsers, and lxml as an optional XML parser; nothing else will do any good. But it works well with any HTTP request engine, so any "global" recommendation is a good idea, so it should get the same list (say, requests, urllib3, grequests, pycurl) as any other project that wants to suggest an HTTP request engine. And as for scraper frameworks, that should look at the global recommendations, but restricted to the ones that use, or can use, BeautifulSoup. I'm not sure how to reasonably represent all three of those things in a node.

Well, #2 is easy. You just have a special "global" node that has the same kind of classifier->package map, and link to that. I don't think #3 can be handled so easily, and probably it's not really worth it complexifying things that far at first -- I think you probably need most of SQL to express such constraints. I suspect that I would handle #3 with a special sort of "group" package, that just requires certain classifiers and then recommends implementations of them that work well together. It would be easy for the database to automatically update a group's recommended implementations to point to the group (which would be yet another new attribute for the package). I'll take a look at the whole shebang and see if I can come up with something a bit more elegant than the crockery of adhoc-ery above, but it will be at least next week before I have anything to say. Steve

Nick Coghlan

2:21 p.m.

On 23 May 2015 at 12:59, Stephen J. Turnbull wrote:

...

Donald Stufft writes:

...
It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different problem. We need a decentralized approach to deal with the "people who use package X often would benefit from Y too, but don't know where to find Y or which implementation to use." IOW, there needs to be a way for X to recommend implementation Z (or implementations Z1 or Z2) of Y.

https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010). There was an effort a few years back to set up an instance of that for PyPI in general, as well as similar comparison sites for Pyramid and Plone, but none of them ever hit the same kind of critical mass of useful input as the Django one. The situation has changed substantially since then, though, as we've been more actively promoting pip, PyPI and third party libraries as part of the recommended Python developer experience, and the main standard library documentation now delegates to packaging.python.org for the details after very brief introductions to installing and publishing packages. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ludovic Gasc

24 May 24 May

11:56 a.m.

Hi all, After to read all responses, I've changed my mind: At the first look, the advantage to push jsonschema into Python lib is to standardize and promote an actual good practice. But yes, you're right, it's too early to include that because the standard should be changed and/or abandonned by a new good practice, like SOAP and REST. It's more future proof to promote PyPI and pip to Python developers. Regards. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-05-23 16:21 GMT+02:00 Nick Coghlan :

...

On 23 May 2015 at 12:59, Stephen J. Turnbull wrote:

...
Donald Stufft writes:

...
It's being solved for very specific cases by starting to have the standard documentation explicitly call out these defacto standards of the Python ecosystem where it makes sense.

Because that's necessarily centralized, it's a solution to a different problem. We need a decentralized approach to deal with the "people who use package X often would benefit from Y too, but don't know where to find Y or which implementation to use." IOW, there needs to be a way for X to recommend implementation Z (or implementations Z1 or Z2) of Y.

https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010).

There was an effort a few years back to set up an instance of that for PyPI in general, as well as similar comparison sites for Pyramid and Plone, but none of them ever hit the same kind of critical mass of useful input as the Django one.

The situation has changed substantially since then, though, as we've been more actively promoting pip, PyPI and third party libraries as part of the recommended Python developer experience, and the main standard library documentation now delegates to packaging.python.org for the details after very brief introductions to installing and publishing packages.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Demian Brecht

27 May 27 May

6:28 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

...

On May 23, 2015, at 7:21 AM, Nick Coghlan wrote:

https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while: With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced. "Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst...) This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set. Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

Paul Moore

6:46 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On 27 May 2015 at 19:28, Demian Brecht wrote:

...

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?

It has been discussed on a number of occasions. The major issue with the idea is that a lot of people use Python in closed corporate environments, where access to the internet from tools such as pip can be restricted. Also, many companies have legal approval processes for software - getting approval for "Python" includes the standard library, but each external package required would need a separate, probably lengthy and possibly prohibitive, approval process before it could be used. So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users. Paul

Demian Brecht

6:57 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

...

On May 27, 2015, at 11:46 AM, Paul Moore wrote:

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

Donald Stufft

7:03 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht@gmail.com) wrote:

...

...
On May 27, 2015, at 11:46 AM, Paul Moore wrote:

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something. This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well. As far as Python is concerned, while I think the above model is better in the general sense, I think that it’s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We’re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It’s also the case (though we’re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Mark Lawrence

8:28 p.m.

New subject: Increasing public package discoverability

On 27/05/2015 20:03, Donald Stufft wrote:

...

On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht@gmail.com) wrote:

...
...
On May 27, 2015, at 11:46 AM, Paul Moore wrote:

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.

This means that people who want/need a comprehensive standard library can get the Platform edition of the runtime which will function similar to the standard library of a language. However, if they run into some critical feature they need or a bug fix, they can selectively choose to step outside of that preset package versions and install a newer version of one of the bundled software. Of course they can install non-bundled software as well.

As far as Python is concerned, while I think the above model is better in the general sense, I think that it’s probably too late to switch to that, the history of having a big standard library goes back pretty far and a lot of people and processes depend on it. We’re also still trying to heal the rift that 3.x created, and creating a new rift is probably not the most effective use of time. It’s also the case (though we’re working to make it less true) that our packaging tools still can routinely run into problems that would make me uncomfortable using them for this approach.

--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Could Python 4 tear out the stdlib completely and go to pypi, to what I believe Nick Coghlan called stdlib+, or would this be A PEP Too Far, given the one or two minor issues over the move from Python 2 to Python 3? Yes this is my very dry sense of humour working, but at the same time if it gets somebody thinking, which in turn gets somebody else thinking, then hopefully ideas come up which are practical and everybody benefits. Just my £0.02p worth. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Andrew Barnert

9:50 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On May 27, 2015, at 12:03, Donald Stufft wrote:

...

...
On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht@gmail.com) wrote:

...
On May 27, 2015, at 11:46 AM, Paul Moore wrote:

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two versions, one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.

Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.) You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob. Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy. That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today?

Donald Stufft

9:54 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On May 27, 2015 at 5:50:55 PM, Andrew Barnert (abarnert@yahoo.com) wrote:

...

On May 27, 2015, at 12:03, Donald Stufft wrote:

...
...
On May 27, 2015 at 2:57:54 PM, Demian Brecht (demianbrecht@gmail.com) wrote:

...
On May 27, 2015, at 11:46 AM, Paul Moore wrote:

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I’m just throwing ideas at the wall here, but would it not be possible to release two

...
...
one for those who choose to use decentralized packages with out-of-band releases and one with all “recommended” packages bundled (obvious potential for version conflicts and such aside)? If one of the prerequisites of a “recommended” package was that it’s released under PSFL, I’m assuming there wouldn’t be any legal issues with going down such a path? That way, you still get the ability to decentralize the library, but don’t alienate the user base that can’t rely on pip?

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library. This you call “FooLang Core” or something of the sort. Then you take the most popular or the best examples or whatever criteria you want from the ecosystem around that and you bundle them all together so that

versions, the third party packages essentially get preinstalled and you call that “FooLang Platform” or something.

Dependencies are always going to be a problem. The best way to parse XML is lxml (and the best way to parse HTML is BeautifulSoup plus lxml); does that mean that the Python Platform requires libxml2? The best way to do numerical computing is with NumPy, and the best way to build NumPy is with MKL on platforms where it exists, ATLAS on others; does that mean the Python Platform requires MKL and/or ATLAS? The best way to build cross-platform GUIs with desktop integration is PySide; does that mean the Python Platform requires Qt? (One of the biggest portability problems for Python in practice has always been Tcl/Tk; Qt would be much worse.)

You could look at it as something like the core plus distributions model used in OS's. FreeBSD has a core and ports; there's a simple rule for what's in core (a complete POSIX system plus enough to build ports, nothing else), and the practicality-vs.-purity decisions for how to apply that to real-life problems isn't that hard. But Linux took a different approach: it's just a kernel, and everything else--libc, the ports system, etc.--can be swapped out. There is no official distribution; at any given time in history, there are 3-6 competing "major distributions", dozens of others based on them, and some "special-case" distros like ucLinux or Android. And that means different distros can make different decisions on what dependencies are acceptable--include packages that only run on x86, or accept some corporate quasi-open-source license or closed-source blob.

Python seems to have fallen into a place halfway between the two. The stdlib is closer to FreeBSD core than to Linux. On the other hand, while many people start with the official stdlib and use pip to expand on it, there are third-party distributions competing to provide more useful or better-organized batteries than the official version, plus custom distributions that come with some OS distros (e.g., Apple includes PyObjC with theirs), and special things like Kivy.

That doesn't seem to have caused any harm, and may have caused a lot of benefit. While Python may not have found the perfect sweet spot, what it found isn't that bad. And the way it continues to evolve isn't that bad. If you could go back in time to 2010 and come up with a grand five-year plan for how the stdlib, core distribution, and third-party ecosystem should be better, how much different would Python be today?

It certainly doesn’t require you to add something to the “Platform” for every topic either. You can still be conservative in what you include in the “Platform” based on how many people are likely to need/want it and what sort of dependency or building impact it has on actually building out the full Platform. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Skip Montanaro

28 May 28 May

2:34 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On Wed, May 27, 2015 at 2:03 PM, Donald Stufft wrote:

...

I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard library.

While perhaps nice in theory, the process of getting a package into the standard library provides a number of filters (hurdles, if you will) through which a package much pass (or surmount) before it is deemed suitable for broad availability by default to users, and for support by the core development team. Today, that includes documentation, unit tests, broad acceptance by the user community (in many cases), and a commitment by the core development team to maintain the package for the foreseeable future. To the best of my knowledge, none of those filters apply to PyPI-cataloged packages. That is not to say that the current process doesn't have its problems. Some really useful stuff is surely not available in the core. If the core development team was stacked with people who program numeric applications for a living, perhaps numpy or something similar would be in the core today. The other end of the spectrum is Perl. It has been more than a decade since I did any Perl programming, and even then, not much, but I still remember how confused I was trying to choose a package to manipulate dates and times from CPAN with no guidance. I know PyPI has a weight field. I just went back and reread the footnote describing it, but I really have no idea how it operates. I'm sure someone nefarious could game that system so their security compromising package drifts toward the top of the list. Try searching for "xml." 2208 packages are return, with weights ranging from 1 to 9. 107 packages have weights of 8 or 9. If the standard library is to dwindle down to next-to-nothing, a better scheme for package selection/recommendation will have to be developed. Skip

Wes Turner

5:07 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On Thu, May 28, 2015 at 9:34 AM, Skip Montanaro wrote:

...

...
I’m of the opinion that, given a brand new language, it makes more sense to have really good packaging tools built in, but not to have a standard

On Wed, May 27, 2015 at 2:03 PM, Donald Stufft wrote: library.

While perhaps nice in theory, the process of getting a package into the standard library provides a number of filters (hurdles, if you will) through which a package much pass (or surmount) before it is deemed suitable for broad availability by default to users, and for support by the core development team. Today, that includes documentation, unit tests, broad acceptance by the user community (in many cases), and a commitment by the core development team to maintain the package for the foreseeable future. To the best of my knowledge, none of those filters apply to PyPI-cataloged packages. That is not to say that the current process doesn't have its problems. Some really useful stuff is surely not available in the core. If the core development team was stacked with people who program numeric applications for a living, perhaps numpy or something similar would be in the core today.

The other end of the spectrum is Perl. It has been more than a decade since I did any Perl programming, and even then, not much, but I still remember how confused I was trying to choose a package to manipulate dates and times from CPAN with no guidance. I know PyPI has a weight field. I just went back and reread the footnote describing it, but I really have no idea how it operates. I'm sure someone nefarious could game that system so their security compromising package drifts toward the top of the list. Try searching for "xml." 2208 packages are return, with weights ranging from 1 to 9. 107 packages have weights of 8 or 9. If the standard library is to dwindle down to next-to-nothing, a better scheme for package selection/recommendation will have to be developed.

A workflow for building CI-able, vendorable packages with coverage and fuzzing? * xUnit XML test results * http://schema.org/AssessAction * Quality 1 (Use Cases n, m) * Quality 2 (Use cases x, y) * SecurityAssessAction * http://schema.org/ChooseAction * Why am I downloading duplicate functionality? * http://schema.org/LikeAction * Community feedback is always helpful. Or, a workflow for maintaining a *distribution of* **versions of** (C and) Python packages?

...

Skip _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

27 May 27 May

11:16 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On 28 May 2015 04:46, "Paul Moore" wrote:

...

On 27 May 2015 at 19:28, Demian Brecht wrote:

...
This is probably a silly idea, but given the above quote and the

new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?

...

It has been discussed on a number of occasions. The major issue with the idea is that a lot of people use Python in closed corporate environments, where access to the internet from tools such as pip can be restricted. Also, many companies have legal approval processes for software - getting approval for "Python" includes the standard library, but each external package required would need a separate, probably lengthy and possibly prohibitive, approval process before it could be used.

So it's unlikely to ever happen, because it would cripple Python for a non-trivial group of its users.

I expect splitting the standard library into a minimal core and a suite of default independently updatable add-ons will happen eventually, we just need to help fix the broken way a lot of organisations currently work as we go: http://community.redhat.com/blog/2015/02/the-quid-pro-quo-of-open-infrastruc... Organisations that don't suitably adapt to the rise of open collaborative models for infrastructure development are going to have a very rough time of it in the coming years. Cheers, Nick. P.S. For a less verbally dense presentation of some of the concepts in that article: http://www.redhat.com/en/explore/infrastructure/na P.P.S. And for a book length exposition of these kinds of concepts: http://www.redhat.com/en/explore/the-open-organization-book

...

Paul

Ian Cordasco

6:55 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht wrote:

...

...
On May 23, 2015, at 7:21 AM, Nick Coghlan wrote:

https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:

With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.

"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” (https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst...)

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

The mirror of this would be asking if Django should rip out it's base classes for models, views, etc. I think Python 4 could move towards perhaps deprecating any duplicated modules, but I see no point to rip the entire standard library out... except maybe for httplib/urllib/etc. (for various reasons beyond my obvious conflict of interest).

...

Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Demian Brecht

7:13 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

...

On May 27, 2015, at 11:55 AM, Ian Cordasco wrote:

The mirror of this would be asking if Django should rip out it's base classes for models, views, etc. I think Python 4 could move towards perhaps deprecating any duplicated modules, but I see no point to rip the entire standard library out... except maybe for httplib/urllib/etc. (for various reasons beyond my obvious conflict of interest).

I can somewhat see the comparison, but not entirely because Django itself is a package and not the core interpreter and set of builtins. There are also other frameworks that split out modules from the core (I’m not overly familiar with either, but I believe both zope and wheezy follow such models). The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as “recommended” without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I’m somewhat cutting my legs out from under me, but I’m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)). I’m also aware of the politics of such a change. What does it mean then for core devs who concentrate on the current standard library and don’t contribute to the interpreter core or builtins?

Demian Brecht

7:16 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

...

On May 27, 2015, at 12:13 PM, Demian Brecht wrote:

without worrying too much about state of development

I should have elaborated on this more: What I mean is more around feature development, such as introducing HTTP/2.0 to requests. The core feature set would still have to be well proven and have minimal to no changes.

Andrew Barnert

10:05 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On May 27, 2015, at 12:13, Demian Brecht wrote:

...

The major advantage of going with a fully distributed model would be the out-of-band releases. While nice to have for feature development, it can be crucial for bug fixes, but even more so for security patches. Other than that, I could see it opening the door to adoption of packages as “recommended” without worrying too much about state of development. requests is a perfect example of that. Note that my personal focus on standard library development is the http package so I’m somewhat cutting my legs out from under me, but I’m starting to think that adopting such a distribution mechanism might solve a number of problems (but is probably just as likely to introduce new ones ;)).

One way to do that might be to focus the stdlib on picking the abstract interfaces (whether in the actual code, like dbm allows bsddb to plug in, or just in documentation, like DB-API 2) and providing a bare-bones implementation or none at all. It would be nice if things like lxml.etree didn't take so much work and it weren't so hard to quantify how perfect of a replacement it is. Or if we had a SortedMapping ABC so the half-dozen popular implementations could share a consistent API, so they could compete more cleanly on things that matter like performance or the need for a C extension. But the example of requests shows how hard, and possibly undesirable, that is. Most people use requests not because of the advanced features it has that urllib doesn't, but because the intermediate-level features that both include have a nicer interface in requests. And, while people have talked about how nice it would be to restructure urllib so that it matches requests' interface wherever possible (while still retaining the existing interface for backward compat), it doesn't seem that likely anyone will actually ever do it. And, even if someone did, and requests became a drop-in replacement for urllib' new-style API and urllib was eventually deprecated, what are the odds competitors like PyCurl would be reworked into a "URL-API 2.0" module?

Wes Turner

7:23 p.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

On Wed, May 27, 2015 at 1:28 PM, Demian Brecht wrote:

...

...
On May 23, 2015, at 7:21 AM, Nick Coghlan wrote:

https://www.djangopackages.com/ covers this well for the Django ecosystem (I actually consider it to be one of Django's killer features, and I'm pretty sure I'm not alone in that - like ReadTheDocs, it was a product of DjangoDash 2010).

Thanks again all for the great discussion here. It seems to have taken quite a turn to a couple other points that I’ve had in the back of my mind for a while:

With with integration of pip and the focus on non-standard library packages, how do we increase discoverability? If the standard library isn’t going to be a mechanism for that (and I’m not putting forward the argument that it should), adopting something like Django Packages might be tremendously beneficial. Perhaps on top of what Django Packages already has, there could be “recommended packages”. Recommended packages could go through nearly just as much of a rigorous review process as standard library adoption before being flagged, although there would be a number of barriers reduced.

So there is a schema.org/SoftwareApplication (or doap:Project, or seon:) Resource, which has * a unique URI (e.g. http://python.org/pypi/readme) * JSON metadata extracted from setup.py into pydist.json (setuptools, wheel) - [ ] create JSON-LD @context - [ ] create mappings to standard schema * [ ] http://schema.org/SoftwareApplication * [ ] http://schema.org/SoftwareSourceCode In terms of schema.org, a Django Packages resource has: * [ ] a unique URI * [ ] typed features (predicates with ranges) * [ ] http://schema.org/review * [ ] http://schema.org/VoteAction * [ ] http://schema.org/LikeAction

...

"Essentially, the standard library is where a library goes to die. It is appropriate for a module to be included when active development is no longer necessary.” ( https://github.com/kennethreitz/requests/blob/master/docs/dev/philosophy.rst... )

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules? I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

Tox is great for this (in conjunction with whichever build system: BuildBot, TravisCI)

...

Also, to put the original question in this thread to rest, while I personally think that the addition of jsonschema to the standard library, whether as a top level package or perhaps splitting the json module into a package and introducing it there would be beneficial, I think that solving the distributed package discoverability is a much more interesting problem and would serve many more packages and users. Aside from that, solving that problem would have the same intended effect as integrating jsonschema into the standard library.

jsonschema // JSON-LD (RDF)

Stephen J. Turnbull

28 May 28 May

1:31 a.m.

New subject: Increasing public package discoverability (was: Adding jsonschema to the standard library)

Demian Brecht writes:

...

This is probably a silly idea, but given the above quote and the new(er) focus on pip and distributed packages, has there been any discussion around perhaps deprecating (and entirely removing from a Python 4 release) non-builtin packages and modules?

Of course there has, including in parallel to your post. It's a dead obvious idea. I'd point to threads, but none of the ones I remember would be of great use; the same ideas and suggestions that were advanced before have been reproduced here. The problems are that the devil is in the details which are rarely specified, and it would have a huge impact on relationships in the community. For example, in the context of a relatively short timed release cycle, I do recall the debates mentioned by Nick over corporate environments where "Python" (the CPython distribution) is approved as a single package, so stdlib facilities are automatically available to "Python" users, but other packages would need to be approved on a package-by-package basis. There's significant overhead to each such application, so it is efficiency-increasing to have a big stdlib in those environments. OK, you say, so we automatically bundle the separate stdlib current at a given point in time with the less frequently released Python core distribution. Now, in the Department of Devilsh Details, do those "same core + new stdlib" bundles get the core version number, the stdlib version number (which now must be different!) or a separate bundle version number? In the Bureau of Relationship Impacts, if I were a fascist QA/security person, I would surely view that bundle as a new release requiring a new iteration of the security vetting process (relationship impact). Maybe the departments doing such vetting are not as fascist as I would be, but we'd have to find out, wouldn't we? If we just went ahead with this process and discovered later that 80% of the people who were depending on the "Python" package now cannot benefit from the bundling because the tarball labelled "Python-X.Y" no longer is eternal, that would be sad. And although that is the drag on a core/stdlib release cycle split most often cited, I'm sure there are plenty of others. Is it worth the effort to try to discover and address all/most/some of those? Which ones to address (and we don't know what problems might exist yet!)?

...

I would think that if there was a system similar to Django Packages that made discoverability/importing of packages as easy as using those in the standard library, having a distributed package model where bug fixes and releases could be done out of band with CPython releases would likely more beneficial to the end users. If there was a “recommended packages” framework, perhaps there could also be buildbots put to testing interoperability of the recommended package set.

I don't think either "recommended packages" or buildbots scales much beyond Django (and I wonder whether buildbots would even scale to the Django packages ecosystem). But the Python ecosystem includes all of Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx* stuff, a dozen more or less popular ORMs, a similar number of web frameworks more or less directly competing with Django itself, and all the rest of the cast of thousands on PyPI. At the present time, I think we need to accept that integration of a system, even one that implements a single application, has a shallow learning curve. It takes quite a bit of time to become aware of needs (my initial reaction was "json-schema in the stdlib? YAGNI!!"), and some time and a bit of Google-foo to translate needs to search keywords. After that, the Googling goes rapidly -- that's a solved problem, thank you very much DEC AltaVista. Then you hit the multiple implementations wall, and after recovering consciousness, you start moving forward again slowly, evaluating alternatives and choosing one. And that doesn't mean you're done, because those integration decisions will not be set in stone. Eg, for Mailman's 3.0 release, Barry decided to swap out two mission-critical modules, the ORM and the REST generator -- after the first beta was released! Granted, Mailman 3.0 has had an extremely long release process, but the example remains relevant -- such reevaluations occur in .2 or .9 releases all the time.) Except for Googling, none of these tasks are solved problems: the system integrator has to go through the process over again each time with a new system, or in an existing system when the relative strengths of the chosen modules vs. alternatives change dramatically. In this last case, it's true that choosing keywords is probably trivial, and the alternative pruning goes faster, but retrofitting the whole system to the new! improved! alternative!! module may be pretty painful -- and there's not necessarily a guarantee it will succeed. IMO, fiddling with the Python release and distribution is unlikely to solve any of the above problems, and is likely to be a step backward for some users. Of course at some point we decide the benefits to other users, the developers, and the release engineers outweigh the costs to the users who don't like the change, but it's never a no-brainer.

Andrew Barnert

22 May 22 May

7:24 p.m.

On May 22, 2015, at 09:39, Demian Brecht wrote:

...

In my mind, the value of bundling anything nowadays really boils down to “this is the suggested de facto standard of solving problem [X] using Python”.

The other way of saying that is to say it explicitly in the stdlib docs, usage docs, and/or tutorial and link to the package. While that used to be pretty rare, that's changed recently. Off the top of my head, there are links to setuptools, requests, nose, py.test, Pillow, PyObjC, py2app, PyWin32, WConio, Console, UniCurses, Urwid, the major alternative GUI frameworks, Twisted, and pexpect. So, if you wrote something to put in the json module docs, the input/output section of the tutorial, or a howto explaining that if you want structured and validated JSON the usual standard is JSON Schema and the jsonschema library can do it for you in Python, that would get most of the same benefits as adding jsonschema to the stdlib without most of the costs.

...

I see two problems with relying on pip and PyPI as an alternative to bundling:

In general, there's a potentially much bigger reason: some projects can't use arbitrary third-party projects without a costly vetting process, or need to work on machines that don't have Internet access or don't have a way to install user site-packages or virtualenvs, etc. Fortunately, those kinds of problems aren't likely to come up for the kinds of projects that need JSON Schema (e.g., Internet servers, client frameworks that are themselves installed via pip, client apps that are distributed by bundling with cx_Freeze/py2app/etc.).

...

1. PyPI is filled with multiple solutions to the same problem. This can be difficult to wade through for the experienced developer, never mind the novice.

Usually this is a strength, not a weakness. Until one project really is good enough to become the de facto standard, you wouldn't want to limit the competition, right? The problem traditionally has been that once something _does_ reach that point, there's no way to make that clear--but now that the stdlib docs link to outside projects, there's a solution.

...

2. You generally won't know about packages that don’t solve problems you’ve solved or are solving. Early on in my adoption of Python, there were a number of times where I just spent time digging through the standard library and was surprised by the offerings that I didn’t even know were a thing. Likewise with jsonschema, I wouldn’t have known it was a thing had a co-worker not introduced me to it a couple years ago.

3249

Age (days ago)

3256

Last active (days ago)

List overview

Download

34 comments

12 participants

participants (12)

Andrew Barnert
Demian Brecht
Donald Stufft
Ian Cordasco
Ludovic Gasc
Mark Lawrence
Nick Coghlan
Paul Moore
Skip Montanaro
Stephen J. Turnbull
Wes Turner
Yury Selivanov

Adding jsonschema to the standard library

Mark Lawrence

tags

participants (12)