The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`)
On 18 September 2014 18:01, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Sep 17, 2014, at 23:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
However, now that CPython ships with pip by default, we may want to consider providing more explicit pointers to such "If you want more advanced functionality than the standard library provides" libraries.
I love this idea, but there's one big potential problem, and one smaller one.
Many of the most popular and useful packages require C extensions. In itself, that doesn't have to be a problem; if you provide wheels for the official 3.4+ Win32, Win64, and Mac64 CPython builds, it can still be as simple as `pip install spam` for most users, including the ones with the least ability to figure it out for themselves.
OK, the key thing to look at here is the user experience for someone who has Python installed, and has a job to do, but needs to branch out into external packages because the stdlib doesn't provide enough functionality. To make this example concrete, I'll focus on a specific use case, which I believe is relatively common, although I can't back this up with hard data. Assume: * A user who is comfortable with Python, or with scripting languages in general * No licensing or connectivity issues to worry about * An existing manual process that the user wants to automate In my line of work, this constitutes the vast bulk of Python use - informal, simple automation scripts. So I'm writing this script, and I discover I need to do something that the stdlib doesn't cover, but I feel like it should be available "out there", and it's sufficiently fiddly that I'd prefer not to write it myself. Examples I've come across in the past: * A console progress bar * Scraping some data off a web page * Writing data into an Excel spreadsheet with formatting * Querying an Oracle database Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem. So I go and ask Google. A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries. Further down there are a few pointers to python-progressbar, which was mentioned in the StackOverflow article, which in turn leads me to the PyPI page for it. The latest version (2.3-dev) is not hosted on PyPI, so I hit all the fun of --allow-external. Ironically, "pip install tqdm" gives me what I want instantly. But it never came up via Google. The rest of the cases are similar, lots of Google searching, often combined with evaluating multiple options, followed by more or less pain installing the software. Things that aren't Python 3 or Windows compatible suck me into the "shall I patch it and submit a PR" minefield. For the last case (an Oracle driver), where I need a C extension and access to external libraries, ironically it's pretty easy. There's no real competition to cx_Oracle, and the PyPI page has what I need, although they ship wininst exes rather than wheels, which means I need to do a download then a wheel convert then a pip install, so it's not ideal, but doable.
From this example, I'd like to see the following improvements to the process:
1. Somewhere I can go to find useful modules, that's better than Google. 2. Someone else choosing the "best option" - I don't want to evaluate 3 different progressbar modules, I just want to write "57% complete" and a few dots! 3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1] 4. Much more community pressure for projects to host their code on PyPI. Some projects have genuine issues with hosting on PyPI, and there are changes being looked at to support them, but for most projects it seems to just be history and inertia. [1] A Linux/OS X user might have more more issues with C extensions. Maybe this can't be solved in any meaningful sense, and maybe it's not something the "Python project" should take responsibility for, but without any doubt, it's the single most significant improvement that could be made to my experience with PyPI. Paul. PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days.
On 09/18/2014 08:26 PM, Paul Moore wrote:
Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem.
So I go and ask Google. A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries. Further down there are a few pointers to python-progressbar, which was mentioned in the StackOverflow article, which in turn leads me to the PyPI page for it. The latest version (2.3-dev) is not hosted on PyPI, so I hit all the fun of --allow-external.
I'd recommend searching PyPI itself. This: https://pypi.python.org/pypi?%3Aaction=search&term=progress+bar gives about 20 top results that look highly relevant. A Google search with "site:pypi.python.org progress bar" also looks like it could have given you what you wanted.
Ironically, "pip install tqdm" gives me what I want instantly. But it never came up via Google.
It also doesn't come up in the PyPI search, because its PyPI page isn't exactly very full of information :) cheers, Georg
On 19 Sep 2014 04:54, "Georg Brandl" <g.brandl@gmx.net> wrote:
On 09/18/2014 08:26 PM, Paul Moore wrote:
Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem.
So I go and ask Google. A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries. Further down there are a few pointers to python-progressbar, which was mentioned in the StackOverflow article, which in turn leads me to the PyPI page for it. The latest version (2.3-dev) is not hosted on PyPI, so I hit all the fun of --allow-external.
Paul, this could make a good "What problem are we actually trying to fix?" summary on pypa.io. We have spent a lot of time so far on the "getting people the packages they ask for" side of things, but have barely scratched the surface of "helping people find the packages that can help them". At the moment "word of mouth" is one of our main discovery tools, and that's an issue for newcomers that may not have a big network of fellow developers yet. This is a problem I think the Django community actually addressed fairly well through https://www.djangopackages.com/ There are similar comparison sites for Pyramid & Plone, after Audrey & Danny broke out the back end of Django Packages to make it independently deployable (see http://opencomparison.readthedocs.org/en/latest/)
I'd recommend searching PyPI itself. This:
https://pypi.python.org/pypi?%3Aaction=search&term=progress+bar
gives about 20 top results that look highly relevant. A Google search with "site:pypi.python.org progress bar" also looks like it could have given you what you wanted.
Ironically, "pip install tqdm" gives me what I want instantly. But it never came up via Google.
It actually occurs to me: I wonder whether anyone at Google might be interested in enhancing it's usefulness as a Python packaging search tool by looking specifically at PyPI's own search index data. (And if they could do that for us, they might be willing to do it for CPAN, RubyGems, npm, CPAN, PEAR, etc) (on the other hand, being a vector for influencing Google search results would mean being a higher priority target for spam, so we may not actually want that)
It also doesn't come up in the PyPI search, because its PyPI page isn't exactly very full of information :)
"SEO for Python Packages" could be a good advanced topic for packaging.python.org :) Cheers, Nick.
cheers, Georg
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 18 September 2014 22:27, Nick Coghlan <ncoghlan@gmail.com> wrote:
It also doesn't come up in the PyPI search, because its PyPI page isn't exactly very full of information :)
"SEO for Python Packages" could be a good advanced topic for packaging.python.org :)
At a bare minimum, some sort of guidance on what keywords to include in the metadata would be useful... A quick sample of 4-5 packages got none with *any* keywords specified. Paul
On 18.09.2014 23:27, Nick Coghlan wrote:
It actually occurs to me: I wonder whether anyone at Google might be interested in enhancing it's usefulness as a Python packaging search tool by looking specifically at PyPI's own search index data. (And if they could do that for us, they might be willing to do it for CPAN, RubyGems, npm, CPAN, PEAR, etc)
(on the other hand, being a vector for influencing Google search results would mean being a higher priority target for spam, so we may not actually want that)
I'm not sure, but isn't Google doing this already? At least package updates uploaded to PyPI usually appear in Google searches quite fast (sometimes instantaneously) so I don't think this is only the result of regular crawls.
On Thu, Sep 18, 2014 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 19 Sep 2014 04:54, "Georg Brandl" <g.brandl@gmx.net> wrote:
On 09/18/2014 08:26 PM, Paul Moore wrote:
Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem.
So I go and ask Google. A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries. Further down there are a few pointers to python-progressbar, which was mentioned in the StackOverflow article, which in turn leads me to the PyPI page for it. The latest version (2.3-dev) is not hosted on PyPI, so I hit all the fun of --allow-external.
Paul, this could make a good "What problem are we actually trying to fix?" summary on pypa.io.
We have spent a lot of time so far on the "getting people the packages they ask for" side of things, but have barely scratched the surface of "helping people find the packages that can help them". At the moment "word of mouth" is one of our main discovery tools, and that's an issue for newcomers that may not have a big network of fellow developers yet.
This is a problem I think the Django community actually addressed fairly well through https://www.djangopackages.com/
There are similar comparison sites for Pyramid & Plone, after Audrey & Danny broke out the back end of Django Packages to make it independently deployable (see http://opencomparison.readthedocs.org/en/latest/)
There is also scipy central for scientific code: http://scipy-central.org/
On Sep 18, 2014, at 2:26 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Maybe this can't be solved in any meaningful sense, and maybe it's not something the "Python project" should take responsibility for, but without any doubt, it's the single most significant improvement that could be made to my experience with PyPI.
Package Discovery is absolutely a thing we stink at, and something we should do better at. This is squarly in the PyPI side of things, I don't think that python-dev needs to recommend a stdlib++ nor any hand picked group of people.. maybe. The PyPI search is kinda grody, it's a horrible inefficienct SQL query that uses a bunch of LIKEs and regexes if I recall. Warehouse has switched to an Elasticsearch backend and I've attempted to do a little tuning of it, however I haven't done a whole lot largely because I'm not an expert and It wasn't that high of a priority. Fundamentally though the problem is what do we use to determine if a package is "good" or not. Folks may or may not remember the great ratings/comments war of yore which was an attempt to add some end user driven ratings to packages on PyPI. That didn't particularly go well and they've long since been disabled. One of the problems is that the names of packages are basically either extremely relevant or not relevant at all. Taking a look at Django migrations prior to Django 1.7 you had "South" which was the de facto standard and "django-migrations" which was a more or less defunct project. Then you'd have a ton of django-* packages which mention both Django and the fact that they have migrations shipped within their long_description. It got somewhat hard to get South to score fairly high on "django migrations"[1][2]. Popularity is a reasonable metric, if I recall Warehouse uses the download counts of a project to weight the search results (although it should probably use rolling counts and not total counts). The idea here is that something that is downloaded more often is likely to be a better overall choice for most people. On Crate I had "favorites" which were functionally equivilant to stars on GitHub which would influence the search results as well. That's about all that I've been able to think of to glean information from, any sort of ratings or what have you system needs to be done very carefully so that it actually provides value and isn't just a pain point. [1] https://pypi.python.org/pypi?%3Aaction=search&term=django+migrations&submit=search [2] https://warehouse.python.org/search/project/?q=django+migrations --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sep 18, 2014, at 11:26, Paul Moore <p.f.moore@gmail.com> wrote:
On 18 September 2014 18:01, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Sep 17, 2014, at 23:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
However, now that CPython ships with pip by default, we may want to consider providing more explicit pointers to such "If you want more advanced functionality than the standard library provides" libraries.
I love this idea, but there's one big potential problem, and one smaller one.
Many of the most popular and useful packages require C extensions. In itself, that doesn't have to be a problem; if you provide wheels for the official 3.4+ Win32, Win64, and Mac64 CPython builds, it can still be as simple as `pip install spam` for most users, including the ones with the least ability to figure it out for themselves.
OK, the key thing to look at here is the user experience for someone who has Python installed, and has a job to do, but needs to branch out into external packages because the stdlib doesn't provide enough functionality.
To make this example concrete, I'll focus on a specific use case, which I believe is relatively common, although I can't back this up with hard data.
Assume:
* A user who is comfortable with Python, or with scripting languages in general * No licensing or connectivity issues to worry about * An existing manual process that the user wants to automate
In my line of work, this constitutes the vast bulk of Python use - informal, simple automation scripts.
So I'm writing this script, and I discover I need to do something that the stdlib doesn't cover, but I feel like it should be available "out there", and it's sufficiently fiddly that I'd prefer not to write it myself. Examples I've come across in the past:
* A console progress bar * Scraping some data off a web page * Writing data into an Excel spreadsheet with formatting * Querying an Oracle database
Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem.
So I go and ask Google.
Hold on. I'm pretty sure that the intended answer to this problem has, for years, been that you go and search PyPI. Is that too broken to use, or are people just not aware of it?
A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries.
StackOverflow is the _last_ place you should be looking. It's part of their policy that "library-shopping" questions should be closed, and has been for quite some time. Anything you find is likely to be either out of date, or in some niche area that few of the active users see.
From this example, I'd like to see the following improvements to the process:
1. Somewhere I can go to find useful modules, that's better than Google.
Again, isn't that PyPI?
2. Someone else choosing the "best option" - I don't want to evaluate 3 different progressbar modules, I just want to write "57% complete" and a few dots!
I don't think anything like this can be "curated", unless it really is restricted to the dozen or two projects that are clearly "best in category" in areas of widespread demand, as (I think) Nick and Terry were discussing. So you're looking for something crowd sourced, which I don't think exists yet. Maybe the newish Software Recs StackExchange site will eventually serve. Or maybe someone will build something specific to the Python community, possibly even built on top of PyPI. But whatever it is, I don't think it's going to be designed by a mailing list discussion; someone has to have a clever idea and hack it up until it works, or at least inspires further ideas. The same way we got easy_install and then pip. There's also the problem that in many areas "the best" is different for different applications, and often you don't know the right question to ask. The best console-mode progress bar, OK, people can disagree with the answer, but they won't disagree much on the criteria. But for anything less trivial, that's not likely to be true. Which is the best XMPP client library, plugin framework, arbitrary-precision float library, sorted mapping, markdown converter, ...? In other words, I agree that this is an important problem, but it's not going to be an easy one to solve, and I don't think solving it should be a prerequisite, or even really relevant, to Nick's stdlib++ idea.
3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1]
I think this is pretty uncommon for the user group you're representing. Most Windows users who are comfortable with scripting languages don't have MSVC or MinGW set up, don't know how to do so (or even which one they should choose), etc. Many packages come with official Windows binaries, but an awful lot of them don't--and, even when they do, the "install" docs often start off with how to set up prereqs and build on *nix machines. Most of the Windows users I see tend to go straight to Christoph Gohlke's archive and use his installers, and for anything not there they just throw their hands up and panic. Making wheels widespread is not just a nice-to-have; wheels are a great solution to a very real problem, and until they're pervasive the problem isn't solved. But meanwhile, wheels don't solve everything. I don't want to repeat the whole second point from my previous email, but given that pip doesn't handle non-Python requirements, some of the most important packages on PyPI are still going to be out of reach for many people, and I don't know what the solution is. (Build a statically linked lxml? Include libxml2.dll in the wheel? Just give a better error message that includes a link to how to download it?)
4. Much more community pressure for projects to host their code on PyPI. Some projects have genuine issues with hosting on PyPI, and there are changes being looked at to support them, but for most projects it seems to just be history and inertia.
[1] A Linux/OS X user might have more more issues with C extensions.
In general, I think they have far fewer problems. Linux users have one thing to learn (install libxnl2-devel, not just libxml2). Same for Mac users (install Xcode and Homebrew). After that, they tend to have a lot fewer problems. (Except with scipy, since it needs a fortran compiler and sometimes needs you to force-rebuild numpy. And except for Mac users who end up with multiple copies of Python 2.7, which isn't going to be a problem for 3.x to worry about in the foreseeable future.)
Maybe this can't be solved in any meaningful sense, and maybe it's not something the "Python project" should take responsibility for, but without any doubt, it's the single most significant improvement that could be made to my experience with PyPI.
I agree that having some kind of
Paul.
PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 18 September 2014 23:20, Andrew Barnert <abarnert@yahoo.com> wrote:
Hold on. I'm pretty sure that the intended answer to this problem has, for years, been that you go and search PyPI. Is that too broken to use, or are people just not aware of it?
Sorry, I should have mentioned that. I've found that PyPI search is great for getting packages named like you expect, but a lot worse when you have something vague ("website scraping" didn't bring up html5lib, BeautifulSoup or lxml, all of which are tools I've used in the past). So I tend to not bother these days on the assumption that it'll just be another dead end. I know, I'm old and cynical, sorry ;-) Paul
On Sep 18, 2014, at 6:20 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
But meanwhile, wheels don't solve everything. I don't want to repeat the whole second point from my previous email, but given that pip doesn't handle non-Python requirements, some of the most important packages on PyPI are still going to be out of reach for many people, and I don't know what the solution is. (Build a statically linked lxml? Include libxml2.dll in the wheel? Just give a better error message that includes a link to how to download it?)
I think the cryptography project has had good success recently switching to statically linking OpenSSL on Windows in the Wheels. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
I like stdlib++, but I also want to say it should remain as non-Python-dev endorsement kind of thing. As a user and a library developer, I see pros and cons. Official endorsement can lead to people to abandon whatever they are working or make them feel excluded or unappreciated, which is not a very positive thing to do. On the other hand, there may only be a very limited number of stdlib replacement people can vouch for easily. For HTTP request, it is obvious that at the moment, Requests is the most widely used library in modern Python codebase in the past several years. For asyn-networking and asyn-task, Twisted and Tornado are probably the best. You might celery in the asyn-task just because many people who choose to run asyn task will end up queuing. For cryptography and security, what do you suggest? There are APIs cryptography doesn't have yet but pycrypto or M2Crypto does. It isn't that neither are bad library, I've used pycrypto and I am careful with using the API, but do we endorse cryptography over the other two (the latter one is pretty dead based on its commit activity, I don't know).
From this example, I'd like to see the following improvements to the
On 09/18/2014 08:26 PM, Paul Moore wrote: process:
1. Somewhere I can go to find useful modules, that's better than Google. 2. Someone else choosing the "best option" - I don't want to evaluate
Well judgement is always required. As an example: search async version of the famous Requests library on PyPi doesn't give beginners any immediate obvious choices. https://pypi.python.org/pypi?%3Aaction=search&term=async+requests&submit=search If we based off just popularity count, we will mislead users. The description ought to be more descriptive, but how descriptive? Can author cover all the possible keywords? This is why, naturally, most people use a search engine and eventually end up either on stackoverflow or some blog post written by me \o/. One idea, similar to npm or gem store is commentary. It can become spam so how about, we can enable tagging and suggestion? What if people can suggest their workflow in a more obvious way than a blog post? If anyone have the time and interest, maybe scan each PyPI package and analyze how a package on PyPI is being used by other packages. We can even go as fa as scanning public repositories should anyone feel the urge to do that, privately. That, however, is a M-B dollar industry, known as search. Though like App Store, Play Store, gem store, whatever, the number of useful comments and the number of participants can vary. Thus, the improvement is may be suboptimal. On Fri, Sep 19, 2014 at 1:39 AM, John Wong <gokoproject@gmail.com> wrote:
I like stdlib++, but I also want to say it should remain as non-Python-dev endorsement kind of thing. As a user and a library developer, I see pros and cons.
Official endorsement can lead to people to abandon whatever they are working or make them feel excluded or unappreciated, which is not a very positive thing to do.
On the other hand, there may only be a very limited number of stdlib replacement people can vouch for easily.
For HTTP request, it is obvious that at the moment, Requests is the most widely used library in modern Python codebase in the past several years.
For asyn-networking and asyn-task, Twisted and Tornado are probably the best. You might celery in the asyn-task just because many people who choose to run asyn task will end up queuing.
For cryptography and security, what do you suggest? There are APIs cryptography doesn't have yet but pycrypto or M2Crypto does. It isn't that neither are bad library, I've used pycrypto and I am careful with using the API, but do we endorse cryptography over the other two (the latter one is pretty dead based on its commit activity, I don't know).
On Fri, Sep 19, 2014 at 01:39:12AM -0400, John Wong wrote:
I like stdlib++, but I also want to say it should remain as non-Python-dev endorsement kind of thing. As a user and a library developer, I see pros and cons.
Official endorsement can lead to people to abandon whatever they are working or make them feel excluded or unappreciated, which is not a very positive thing to do.
Yes. Also, it can discourage innovation and encourage monoculture. As has often been said, the standard library is where good packages go to die. In the standard library, a package is ubiquitous, but it's also pretty much in stasis, unlikely to change much. "We biologists have a word for stable: 'dead'." - some biologist So if you want something under active development, likely to gain new features, you normally look outside the std lib, where there is plenty of innovation, experimentation and competition. Now imagine that one package gets recommended as "best of breed" for some particular task in this hypothetical stdlib++. It will combine the ubiquity of the stdlib (because everyone uses it) without the disadvantage of stasis. That could make it much harder for a new package in the same field to become well-known enough to compete for users and developers. Now obviously there are "best of breed" third party libraries. I don't see many people trying to build a better BeautifulSoup. (Maybe they are, and I just don't know about them, which demonstrates the problem.) That's not necessarily a bad thing. But I think we should be careful of putting the thumb on the scales too much. -- Steven
On 19 September 2014 12:59, Steven D'Aprano <steve@pearwood.info> wrote:
Now obviously there are "best of breed" third party libraries. I don't see many people trying to build a better BeautifulSoup. (Maybe they are, and I just don't know about them, which demonstrates the problem.) That's not necessarily a bad thing. But I think we should be careful of putting the thumb on the scales too much.
Agreed. I hadn't considered the point about discouraging innovation. So it seems to me that the key things needed are: 1. Discoverability - making it easy for package authors to ensure people find their packages. There's a social aspect here as well - quirky project names like celery or freezegun are a part of the Python culture, but they do make things less discoverable (what do those two projects do?). We need a way to balance that. 2. A clearer way for projects to declare what they support. Python 2 or 3 or both? Is Windows supported? And it's not just "yes or no", often it's somewhere in the middle - for example I know projects that want to work on Windows, but don't have the expertise, so they'll accept patches but you have to expect to do some of the work for them... Classifiers should do this, but use is patchy (particularly around the supported OS). 3. Most controversially, some way of rating otherwise-equal packages. Ultimately, I don't want a choice, I just want to install something and move on. Download counts may be the easiest compromise here (at least lots of people use the project...) As well as the technical aspect of making it possible to provide this information (much of it, projects *can* provide right now, although maybe not always as easily as might be ideal) there is a social aspect - making it so that it's the norm for projects to do so, and not doing so is viewed as an indication of overall attention to detail on the project. Paul
On Sep 19, 2014, at 5:26, Paul Moore <p.f.moore@gmail.com> wrote:
On 19 September 2014 12:59, Steven D'Aprano <steve@pearwood.info> wrote:
Now obviously there are "best of breed" third party libraries. I don't see many people trying to build a better BeautifulSoup. (Maybe they are, and I just don't know about them, which demonstrates the problem.) That's not necessarily a bad thing. But I think we should be careful of putting the thumb on the scales too much.
Agreed. I hadn't considered the point about discouraging innovation. So it seems to me that the key things needed are:
1. Discoverability - making it easy for package authors to ensure people find their packages. There's a social aspect here as well - quirky project names like celery or freezegun are a part of the Python culture, but they do make things less discoverable (what do those two projects do?). We need a way to balance that.
There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser? Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP.
2. A clearer way for projects to declare what they support. Python 2 or 3 or both? Is Windows supported? And it's not just "yes or no", often it's somewhere in the middle - for example I know projects that want to work on Windows, but don't have the expertise, so they'll accept patches but you have to expect to do some of the work for them... Classifiers should do this, but use is patchy (particularly around the supported OS). 3. Most controversially, some way of rating otherwise-equal packages. Ultimately, I don't want a choice, I just want to install something and move on. Download counts may be the easiest compromise here (at least lots of people use the project...)
As well as the technical aspect of making it possible to provide this information (much of it, projects *can* provide right now, although maybe not always as easily as might be ideal) there is a social aspect - making it so that it's the norm for projects to do so, and not doing so is viewed as an indication of overall attention to detail on the project.
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 19 September 2014 16:15, Andrew Barnert <abarnert@yahoo.com> wrote:
There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser?
Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP.
I wasn't trying to say that "boring" names were better - far from it, quirky names are part of the Python culture. But there isn't anything stopping Beautiful Soup adding keywords like "html parser scraping" or something like that. The trick is picking keywords that people will search for, which is often hard (and that's probably why the keyword metadata is underused). Having a registry of commonly-used keywords, and a user interface that means you don't have to go online and search when you're writing your app, might help. Think of something like Stack Overflow's auto-completed tags. I don't have any good solutions here, I just think it *should* be possible for the author of Beautiful Soup to make it clear to potential users that it might be what they are looking for when they think of <insert relevant task here>. As somebody said earlier in this thread, think of it as search engine optimisation for projects. Paul
On Sep 19, 2014, at 8:25, Paul Moore <p.f.moore@gmail.com> wrote:
On 19 September 2014 16:15, Andrew Barnert <abarnert@yahoo.com> wrote:
There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser?
Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP.
I wasn't trying to say that "boring" names were better - far from it, quirky names are part of the Python culture. But there isn't anything stopping Beautiful Soup adding keywords like "html parser scraping" or something like that. The trick is picking keywords that people will search for, which is often hard (and that's probably why the keyword metadata is underused). Having a registry of commonly-used keywords, and a user interface that means you don't have to go online and search when you're writing your app, might help. Think of something like Stack Overflow's auto-completed tags.
I don't have any good solutions here, I just think it *should* be possible for the author of Beautiful Soup to make it clear to potential users that it might be what they are looking for when they think of <insert relevant task here>. As somebody said earlier in this thread, think of it as search engine optimisation for projects.
That's a good point. BeautifulSoup, lxml, requests, Scrapy, Mechanize, and pyv8 certainly don't belong in any sane category together, but they also clearly share in common the fact that they're useful for (different but overlapping kinds of) scraping projects, and a keyword-influenced search (as opposed to keyword browsing) seems like the right way to get that across. The problem is that search tends to work best when, even if you don't know what you're looking for, you'll easily recognize it when you see it. But if I'm searching for scraping, is Mechanize or BeautifulSoup better? That question doesn't just not have an answer, it doesn't even make sense. I pretty much have to look at what they both do to figure out which one fills the hole in my scraping project. Which means that "search is the answer" tends to conflict with the earlier goal from this discussion of "just tell me the best answer, don't make me research it". That may just be because that goal is impossible to do well, and making it a lot easier to research by giving you pointers at a few different things that you are likely to want to look into. Or it may mean that "find the most stable implementation of frequently-implemented simple thing X" (like a console-mode progress bar) and "find me good tools to help with complex thing Y" (like scraping web sites) are inherently different problems, and there might be a solution that's _helpful_ for both, but anything that's perfect for one will be useless for the other.
On Fri, Sep 19, 2014 at 5:25 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 19 September 2014 16:15, Andrew Barnert <abarnert@yahoo.com> wrote:
There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser?
Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP.
I wasn't trying to say that "boring" names were better - far from it, quirky names are part of the Python culture. But there isn't anything stopping Beautiful Soup adding keywords like "html parser scraping" or something like that. The trick is picking keywords that people will search for, which is often hard (and that's probably why the keyword metadata is underused). Having a registry of commonly-used keywords, and a user interface that means you don't have to go online and search when you're writing your app, might help. Think of something like Stack Overflow's auto-completed tags.
Or some sort of keyword aliases, so that if someone searches for one word in the list of aliases, any project with any keyword in the corresponding list of alias is also matched.
On 19 September 2014 21:59, Steven D'Aprano <steve@pearwood.info> wrote:
Now obviously there are "best of breed" third party libraries. I don't see many people trying to build a better BeautifulSoup. (Maybe they are, and I just don't know about them, which demonstrates the problem.) That's not necessarily a bad thing. But I think we should be careful of putting the thumb on the scales too much.
The problem is that *failing* to provide recommendations can induce analysis paralysis in newcomers, such that they decide a less mature, more centralised ecosystem with fewer choices is "easier", or even "more powerful". While the "easier" can be accurate (since there are fewer choices to make), the "more powerful" is generally. It's just that the discoverability problem is *much* easier to solve when a language community is dominated by a single use case or a single major vendor, so providing opinionated guidance becomes less controversial. This can lead to comparing one language's entire ecosystem to another (less familiar) language's standard library, rather than comparing the full strength of both ecosystems. As an example of how a default recommendation can help address the discoverability problem, these days, if someone is completely new to Python web development, I will tell them "start with Django, and use djangopackages.com for package recommendations". However, I will also tell them that Django isn't necessarily the best fit for everything, so sometimes something like Flask or Pyramid (et al) might be a better choice. The reason I do it that way is that Django is far more opinionated than most other Python web frameworks and makes a lot more decisions *for* you. Experts have legitimate reasons for debating several of Django's choices, but newcomers don't yet know enough to understand when and why you might want to do something differently. Recommending a framework that makes those choices on their behalf is actually helpful at that point, even if it means they'll be missing out on some other cool libraries (at least for the time being). This approach provides an experience far closer to the modern Ruby web development model where Rails is the default choice (by a long way), and folks only later start considering the use of something less opinionated like Sinatra for problems where an all-inclusive solution like Rails is less appropriate. Providing "pip" by default is the same way - there are actually times when it isn't the best choice for solving particular packaging related problems, but it's always enough to get you started, *and* it lets you more easily bootstrap other tools like conda and zc.buildout when they're a better fit. The goal in providing "default recommendations" is thus to minimise the "time to awesome" for newcomers, while gently nudging them in the direction of good development practices, rather than raising barriers in front of them that say "you must first learn how to make this apparently irrelevant decision before you can continue". And if someone never needs to tackle problems that require moving beyond the default recommendations? That's fine - there are an awful lot of people happily solving problems without moving beyond what the standard library or their redistributor provide. You also see this pattern with some of the design guidance we give newcomers like "don't use metaclasses" and "don't use monkeypatching (except as part of a test mocking library)". What we actually mean is "these are power tools, and genuinely hard to use well. By the time you're ready to wield them, you'll likely also have realised that the implied caveat on the standard advice is 'unless it's the only way to solve a specific problem'". That's a complex mouthful to inflict on someone that is still learning though, so we just oversimplify the situation instead (even though the end result is so oversimplified as to technically be a lie). My experience is also that the scientific community appear to be *far* more pragmatic about library choices than the professional development community (or anyone that is interested in programming for its own sake, rather than what it lets us do). We're a bit more prone to looking under the hood and deciding we don't like a library because of how it's made - competing libraries may arise based on differences in opinion regarding good design and development practices, rather than fundamental limitations in what a library makes possible. Scientists and data analysts are far more likely to just grab a library because of what it lets them *do* (or communicate), without looking too closely at how its put together. That greater level of pragmatism then seems to make it easier for category killers to arise at the library level. However, I have to admit this particular idea is a speculative hypothesis, rather than something that has been subjected to rigorous objective analysis. Anyone looking for a PhD topic in sociology? :) Regards, Nick. P.S. As far as I can tell, the relative ease with which dominance can be asserted is also why VCs tend to favour newer languages with more centralised ecosystems - it's easier for them to assert control, and attempt to "punch the ticket" if the language achieves future success. It's much harder to do that for a massive decentralised sprawl like the Python community. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Sep 19, 2014 at 12:20 AM, Andrew Barnert < abarnert@yahoo.com.dmarc.invalid> wrote:
On Sep 18, 2014, at 11:26, Paul Moore <p.f.moore@gmail.com> wrote:
OK, the key thing to look at here is the user experience for someone who has Python installed, and has a job to do, but needs to branch out into external packages because the stdlib doesn't provide enough functionality.
To make this example concrete, I'll focus on a specific use case, which I believe is relatively common, although I can't back this up with hard data.
Assume:
* A user who is comfortable with Python, or with scripting languages in
general
* No licensing or connectivity issues to worry about * An existing manual process that the user wants to automate
In my line of work, this constitutes the vast bulk of Python use - informal, simple automation scripts.
So I'm writing this script, and I discover I need to do something that the stdlib doesn't cover, but I feel like it should be available "out there", and it's sufficiently fiddly that I'd prefer not to write it myself. Examples I've come across in the past:
* A console progress bar * Scraping some data off a web page * Writing data into an Excel spreadsheet with formatting * Querying an Oracle database
Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem.
So I go and ask Google.
Hold on. I'm pretty sure that the intended answer to this problem has, for years, been that you go and search PyPI. Is that too broken to use, or are people just not aware of it?
In addition to the issues others have brought up, this is only useful if you are trying to get a subject specific tools that incorporates a whole range of general tools on that subject. So if you are looking for something like scipy, django, requests, etc. However, what if you are looking for something much more specific? For example, say I have some specific thing I want to do with a dict. How can I find a library that provides that functionality?
Paul Moore <p.f.moore@gmail.com> writes: ...
1. Somewhere I can go to find useful modules, that's better than Google.
Google is very good at searching programming related topics on the web compared to other search engines or custom searches on niche sites. I sporadically try to use alternatives and at best they are good enough but worse than Google. If you know any examples to the contrary, please share.
2. Someone else choosing the "best option" - I don't want to evaluate 3 different progressbar modules, I just want to write "57% complete" and a few dots!
The issue is that "best option" is often different for different people or fashion-driven and therefore transient.
3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1]
I have the opposite impression. http://pythonforengineers.com/stop-struggling-with-python-on-windows/ C extensions are not an issue on POSIX systems: there are package managers (official or not) and the compilers are easily available if you want the latest and greatest. ...
[1] A Linux/OS X user might have more more issues with C extensions.
...
PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days.
Opinions may vary: http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to... Or: "The artifact approach is unambiguously better for any production deployment. The source-based approach found in Ruby, Perl, and Python is a problem for me more often than a solution." https://news.ycombinator.com/item?id=7070464 Though wheel binary package format is designed to solve it. -- Akira
On 19 September 2014 13:15, Akira Li <4kir4.1i@gmail.com> wrote:
Paul Moore <p.f.moore@gmail.com> writes:
3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1]
I have the opposite impression. http://pythonforengineers.com/stop-struggling-with-python-on-windows/
Well yes, but that article is clearly focused on scientific use (a known "worst case" area) and ends up recommending conda (which is a perfectly fair solution, and as the author states, works well). My comment was a bit unfair, though. I have a C compiler installed (which, although it's not hard to set up VS Express, isn't normal). And I also treat "find a wininst installer or egg and run wheel convert on it" as trivial and acceptable, which it isn't for people who aren't packaging specialists. But we're working on this, and I stand by the statement that when projects routinely distribute wheels if they include C extensions, binary dependencies will be a minor issue.
PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days.
Opinions may vary: http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to...
The discussion here is about packaging, not about finding 3rd party packages to solve a problem. I know of no better way to find a package to parse an ini file in Java than google, which is no better than Python. And what I found was packages on sourceforge and other generic hosting sites. Maven may be a central repository - I've never used it myself as the complexity has always scared me off (you could say that about most of Java, though ;-))
Or: "The artifact approach is unambiguously better for any production deployment. The source-based approach found in Ruby, Perl, and Python is a problem for me more often than a solution." https://news.ycombinator.com/item?id=7070464
Again, that's deployment rather than discovery. Paul
Paul Moore <p.f.moore@gmail.com> writes:
On 19 September 2014 13:15, Akira Li <4kir4.1i@gmail.com> wrote:
Paul Moore <p.f.moore@gmail.com> writes:
3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1]
I have the opposite impression. http://pythonforengineers.com/stop-struggling-with-python-on-windows/
Well yes, but that article is clearly focused on scientific use (a known "worst case" area) and ends up recommending conda (which is a perfectly fair solution, and as the author states, works well).
My comment was a bit unfair, though. I have a C compiler installed (which, although it's not hard to set up VS Express, isn't normal). And I also treat "find a wininst installer or egg and run wheel convert on it" as trivial and acceptable, which it isn't for people who aren't packaging specialists.
But we're working on this, and I stand by the statement that when projects routinely distribute wheels if they include C extensions, binary dependencies will be a minor issue.
PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days.
Opinions may vary: http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to...
The discussion here is about packaging, not about finding 3rd party packages to solve a problem. I know of no better way to find a package to parse an ini file in Java than google, which is no better than Python. And what I found was packages on sourceforge and other generic hosting sites.
The main idea of my message [1] is that Google is very good at "finding 3rd party packages to solve a problem." If you know a better solution for any programming language; do tell. [1]: https://mail.python.org/pipermail/python-ideas/2014-September/029417.html The rest is just some references to balance your statements.
Maven may be a central repository - I've never used it myself as the complexity has always scared me off (you could say that about most of Java, though ;-))
Or: "The artifact approach is unambiguously better for any production deployment. The source-based approach found in Ruby, Perl, and Python is a problem for me more often than a solution." https://news.ycombinator.com/item?id=7070464
Again, that's deployment rather than discovery.
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Akira
On Sep 19, 2014, at 5:42, Paul Moore <p.f.moore@gmail.com> wrote:
Maven may be a central repository - I've never used it myself as the complexity has always scared me off (you could say that about most of Java, though ;-))
Don't you just go to the authority factory, ask it to give you an authority, then ask the authority for repository recommender factor, and so on through 8 more steps until you get a general package object that you can cast to the type you need? And the beauty is that, thanks to the magic of static type checking with a rigid but weak type system, if it turns out not to be able to do what you want, instead of a TypeError: BeautifulSoup has no SGML parser that you've never seen before? you get the same convenient NullPointerException from your ISGMLParser as in every other bug you've had so far. :) But seriously, I think it is worth looking at what other languages' package systems have that we might want to be jealous of as far as discovery, especially those (Cargo, go pkg, Node, gems, CocoaPods, etc.) that were designed after CPAN and the Cheeseshop and the related communities existed. I think Java, C++, and C may be the _least_ relevant places to look (although I could definitely be wrong about that). For example, some of the newer languages' systems make it easier for you to fork an existing package, and to tie packaging to DVCS repos, with the consequence that if abarnert/spamify falls by the wayside, it might be more likely pmoore/spamify that replaces it than in Python, where it would more likely be a punny PIL->Pillow type of thing. That seems like a nice feature. Do we want that? Can we get from here to there without radical changes?
On Sep 19, 2014, at 11:39 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Sep 19, 2014, at 5:42, Paul Moore <p.f.moore@gmail.com> wrote:
Maven may be a central repository - I've never used it myself as the complexity has always scared me off (you could say that about most of Java, though ;-))
Don't you just go to the authority factory, ask it to give you an authority, then ask the authority for repository recommender factor, and so on through 8 more steps until you get a general package object that you can cast to the type you need? And the beauty is that, thanks to the magic of static type checking with a rigid but weak type system, if it turns out not to be able to do what you want, instead of a TypeError: BeautifulSoup has no SGML parser that you've never seen before? you get the same convenient NullPointerException from your ISGMLParser as in every other bug you've had so far. :)
But seriously, I think it is worth looking at what other languages' package systems have that we might want to be jealous of as far as discovery, especially those (Cargo, go pkg, Node, gems, CocoaPods, etc.) that were designed after CPAN and the Cheeseshop and the related communities existed. I think Java, C++, and C may be the _least_ relevant places to look (although I could definitely be wrong about that).
For example, some of the newer languages' systems make it easier for you to fork an existing package, and to tie packaging to DVCS repos, with the consequence that if abarnert/spamify falls by the wayside, it might be more likely pmoore/spamify that replaces it than in Python, where it would more likely be a punny PIL->Pillow type of thing. That seems like a nice feature. Do we want that? Can we get from here to there without radical changes?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I think at the point we’re well into the territory of what should move to distutils-sig. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
participants (10)
-
Akira Li
-
Andrew Barnert
-
Donald Stufft
-
Georg Brandl
-
John Wong
-
Nick Coghlan
-
Paul Moore
-
Steven D'Aprano
-
Todd
-
Wolfgang Maier