"Curated" package repo?
Stating a new thread with a correct title. On 2 Jul 2023, at 10:12, Paul Moore <p.f.moore@gmail.com> wrote: Unfortunately, too much of this discussion is framed as “someone should”,
or “it would be good if”. No-one is saying “I will”. Naming groups, like “the PyPA should” doesn’t help either - groups don’t do things, people do. Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d *use* a curated index, I sure as heck couldn’t *create* one.
Well, I started this topic, and I don't *think* I ever wrote "someone should", and I certainly didn't write "PyPa should". But whatever I or anyone else wrote, my intention was to discuss what might be done to address what I think is a real problem/limitation in the discoverability of useful packages for Python. And I think of it not so much as "someone should" but as "it would be nice to have". Of course, any of these ideas would take a lot of work to implement -- and even though there are a lot of folks, on this list and elsewhere, that would like to help, I don't think any substantial open-source project has gotten anywhere without a concerted effort by a very small group (often just 1) of people doing a lot of work to get it to a useful state before a larger group can contribute. So I"m fully aware that nothings going to happen unless *someone* really puts the work in up front. That someone *might* be me, but I'm really good at over-committing myself, and not so great at keeping my nose to the grindstone, so .... And I think this particular problem calls for a solution that would have to be pretty well established before reaching critical mass to actually be useful -- after all, we already have PyPi -- why go anywhere else that is less comprehensive? All that being said, it's still worth having a conversation about what a good solution might look like -- there are a lot of options, and hashing out some of the ideas might inspire someone to rise to the occasion. The :problem", as I see it. - The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work. So the odds that there's a package that does what you need are good, but it can be pretty hard to find them sometimes -- and can be a fair bit of work to sift through to find the good ones -- and many folks don't feel qualified to do so. This can result in two opposite consequences: 1) People using a package that really isn't reliable or maintained (or not supported on all platforms, or ..) and getting stuck with it (I've had that on some of my projects -- I'm pretty careful, but not everyone on my team is) 2) People writing their own code - wasting time, and maybe not getting a very good solution either. I've also had that problem on my projects... To avoid this -- SOME way for folks to find packages that have at least has some level of vetting would be good -- exactly what level of vetting, is a very open question, but I think "even a little" could be very helpful. A few ideas that have come up in the previous thread -- more or less in order of level of effort. 1) A "clearing house" of package reviews, for lack of a better word -- a single web site that would point to various resources on the internet -- blog posts, etc, that would help people select packages. So one might go to the "JSON: section, and find links to some of the comparisons of JSON parsers, to help you choose. The authors of this site could even solicit folks to write reviews, etc that they could then point to. Chris A: this is my interpretation of your decentralized idea -- please do correct me if I got it wrong. 2) A Python package review site. This could be setup and managed with, e.g. a gitHub repo, so that there could be a small number of editors, but anyone could contribute a review via PR. The ultimiate goal would be reviews/recommendations of all the major package categories on PyPi. If well structured and searchable, this could be very useful. - This was proposed by someone on the previous thread -- again, correct me if I'm wrong. 3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever. If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches. 4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain. Personally, I think (4) is the best end result, but probably the most work as well[*], so ??? (1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much. Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social. And of course, if anyone wants to join forces on getting something going. -CHB [*] maybe not, actually -- once the infrastructure was in place, adding a package would require someone requesting that it be added, and someone on a "core team" approving it. That could be a pretty low level of effort, actually. The actual mechanism would be to simply copy it from PyPi once approved -- not that hard to automate. hmmm.... Probably the biggest challenge would be coming up with the criteria for approval -- not an easy question. And it would require substantial critical mass to be at all useful. -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, 5 Jul 2023 at 10:26, Christopher Barker <pythonchb@gmail.com> wrote:
The :problem", as I see it.
- The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.
Remember though that this problem is also Python's strength here. The standard library does not NEED to be comprehensive, and publishing on PyPI is deliberately easy. The barrier to entry for the stdlib is high; the barrier to entry for PyPI is low.
1) A "clearing house" of package reviews, for lack of a better word -- a single web site that would point to various resources on the internet -- blog posts, etc, that would help people select packages. So one might go to the "JSON: section, and find links to some of the comparisons of JSON parsers, to help you choose. The authors of this site could even solicit folks to write reviews, etc that they could then point to. Chris A: this is my interpretation of your decentralized idea -- please do correct me if I got it wrong.
That is a correct interpretation of what I said, yes. I'll take it a bit further though and add that this "single web site" would be ideally editable by multiple people. In fact, Python has a place for that sort of thing: https://wiki.python.org/moin/FrontPage Imagine a page like https://wiki.python.org/moin/Python_Package_Recommendations (doesn't currently exist) that would be managed the same way as other recommendation pages like https://wiki.python.org/moin/IntroductoryBooks - anyone can add a link to their blog post about packages, and if they've focused on a specific category (eg "web frameworks"), that can be right there in the wiki page so people can search for it. That way, it's decentralized for editing, but has a central "hub" that people can easily find.
2) A Python package review site. This could be setup and managed with, e.g. a gitHub repo, so that there could be a small number of editors, but anyone could contribute a review via PR. The ultimiate goal would be reviews/recommendations of all the major package categories on PyPi. If well structured and searchable, this could be very useful. - This was proposed by someone on the previous thread -- again, correct me if I'm wrong.
I suspect this would end up being broadly equivalent to the first option, but with more effort by a core group of people (or a single maintainer), and in return, would have a more consistent look and feel.
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
Huge challenge indeed. Possibly insurmountable. Popularity contests (purely based on download stats and such) have their value, but do not tell you what's best, only what's the most used. Community ratings, as you say, can be gamed all too easily, plus they STILL tend to favour those that are heavily used rather than niche things that are potentially better. Neither of them adequately answers questions like "which is right *for this use-case*", which will leave a lot of packages out in the cold.
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Definitely possible; how would this compare to Conda? What about OS package managers like the Debian repositories?
Personally, I think (4) is the best end result, but probably the most work as well[*], so ???
(1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.
Right, and notably, they can be useful without covering every topic. You could write a blog post about database ORM packages, and that's useful right there, without worrying about whether there's any review of internet protocol packages. Hmm. Once that sort of collection gets some traction, someone might say "Hey, anyone know about grammar parsers?" and add that to a "help wanted" list. That could be useful inspiration.
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
And of course, if anyone wants to join forces on getting something going.
Not many forces needed, but if you want to go with the Python Wiki, it might require a wiki admin to create the initial page, in which case I'd be happy to do that. ChrisA
On Tue, Jul 4, 2023 at 5:49 PM Chris Angelico <rosuav@gmail.com> wrote:
The :problem", as I see it.
- The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used
On Wed, 5 Jul 2023 at 10:26, Christopher Barker <pythonchb@gmail.com> wrote: packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.
Remember though that this problem is also Python's strength here. The standard library does not NEED to be comprehensive, and publishing on PyPI is deliberately easy. The barrier to entry for the stdlib is high; the barrier to entry for PyPI is low.
Absolutely -- and I"m not making any comment about the stdlib -- it's not the point here. But yes, the low (zero) barrier to entry to PyPi was probably the right way to go when it got started -- now, I'm not so sure. *some* barrier to entry would be helpful.
Thanks, I had forgotten about the Wiki! It foundered for a while ( a lot of years ago -- I"ve been around...), but at a glance, it's looking good now. Imagine a page like
...
That way, it's decentralized for editing, but has a central "hub" that people can easily find.
yup -- great idea.
I suspect this would end up being broadly equivalent to the first option, but with more effort by a core group of people (or a single maintainer), and in return, would have a more consistent look and feel.
yup.
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
Neither of them adequately answers questions like "which is right *for this use-case*",
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either. Which makes me realize that having a centralized package review site is complementary to a curated package index -- they are not replacements for one another.
4) A self contained repository of packages that you could point pip to --
... Definitely possible; how would this compare to Conda? Technically, conda is similar to pip -- it has a default "channel" (a channel is an indexed repository of packages) it points to, and you can point it to a different one, or any number of others, or install a single package from a particular channel. Socially, it's pretty different - There is no channel like PyPi that anyone can put anything on willy nilly. - The default channel is operated by Anaconda.com -- and no one else can put any thing on there. (they take suggestions, but it's a pretty big lift to get them to add a package) - The protocol for a channel is pretty simple -- all you really need is an http server, but in practice, most folks host their channels on the Anaconda.org server -- it's a free service that anyone can create a channel on -- there are a LOT -- folks use them for their personal projects, etc. - Then there is conda-forge: It grew out of an effort to collaborate among a number of folks operating channels focused on particular fields -- met/ocean science, astronomy, computational biology, ... we all had different needs, but they overlapped -- why not share resources? Thanks to the heroic efforts of a few folks, it grew to what it is now: a gitHub and CI -based conda package build system that published a conda channel on anaconda.org with over 22,000 (wow! I think I'm reading that right) packages. (https://anaconda.org/conda-forge/repo) They are curated -- anyone can propose a new package (via PR) -- but it only gets added once it's been reviewed and approved by the core team. Curation wasn't the goal, but it's necessary in order to have any hope that they will all work together. The review process is really of the package, not the code in the package (is it built correctly? is it compatible with the rest of conda-forge? Does it include the license file? Is there a maintainer? ...) But the end result is a fair bit of curation -- users can be assured that: 1 - The package works 2 - The package is useful enough that someone took the time to get it up there. 3 - It's very unlikely to be malware (I don't think the conda-forge policy really looks hard for that, but typosquatting and that sort of thing are pretty much impossible. What about OS package managers like the Debian repositories?
I have no idea, other than that the majors, at least, put a LOT of work into having a pretty comprehensive base repository of "vetted" packages
(1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.
Right, and notably, they can be useful without covering every topic. You could write a blog post about database ORM packages, and that's useful right there, without worrying about whether there's any review of internet protocol packages.
Exactly!
Not many forces needed, but if you want to go with the Python Wiki, it might require a wiki admin to create the initial page, in which case I'd be happy to do that.
Thanks! conda-forge has about 22,121 -- that's enough to be very useful, but a lot of use-cases are not well covered, and I know I still have to contribute one once in a while. Looking now -- PyPi has 465,295 projects more than 20 times as many -- I wonder how many of those are "useful"? As talked about, you can only get so much with stats -- but if someone is familiar with working with the PyPi stats, I'd love some help exploring that question. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, 5 Jul 2023 at 17:00, Christopher Barker <pythonchb@gmail.com> wrote:
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
Indeed, not the same question; but "some definition of worthwhile" is the crucial point here. If there is one single curated package index of "worthwhile" packages, who decides what's on it and what's not? If not everyone can agree, will there have to be multiple such listings?
Technically, conda is similar to pip -- it has a default "channel" (a channel is an indexed repository of packages) it points to, and you can point it to a different one, or any number of others, or install a single package from a particular channel.
Socially, it's pretty different - There is no channel like PyPi that anyone can put anything on willy nilly. - The default channel is operated by Anaconda.com -- and no one else can put any thing on there. (they take suggestions, but it's a pretty big lift to get them to add a package) - The protocol for a channel is pretty simple -- all you really need is an http server, but in practice, most folks host their channels on the Anaconda.org server -- it's a free service that anyone can create a channel on -- there are a LOT -- folks use them for their personal projects, etc.
So, high barrier to entry. Good to know. That's neither good nor bad inherently, but it is a point of note.
- Then there is conda-forge: It grew out of an effort to collaborate among a number of folks operating channels focused on particular fields -- met/ocean science, astronomy, computational biology, ... we all had different needs, but they overlapped -- why not share resources? Thanks to the heroic efforts of a few folks, it grew to what it is now: a gitHub and CI -based conda package build system that published a conda channel on anaconda.org with over 22,000 (wow! I think I'm reading that right) packages.
(https://anaconda.org/conda-forge/repo)
They are curated -- anyone can propose a new package (via PR) -- but it only gets added once it's been reviewed and approved by the core team. Curation wasn't the goal, but it's necessary in order to have any hope that they will all work together. The review process is really of the package, not the code in the package (is it built correctly? is it compatible with the rest of conda-forge? Does it include the license file? Is there a maintainer? ...) But the end result is a fair bit of curation -- users can be assured that: 1 - The package works 2 - The package is useful enough that someone took the time to get it up there. 3 - It's very unlikely to be malware (I don't think the conda-forge policy really looks hard for that, but typosquatting and that sort of thing are pretty much impossible.
Cool. The trouble is, point 1 is nearly impossible to assure except in the very narrowest of definitions, and point 2's value correlates with the height of the barrier to entry, so it's a fairly strict tradeoff. And unless that barrier is extremely high, there will always be the possibility that someone puts in the effort to get malware pushed, although it does become vanishingly improbable.
What about OS package managers like the Debian repositories?
I have no idea, other than that the majors, at least, put a LOT of work into having a pretty comprehensive base repository of "vetted" packages
Right; hence the question of how a "vetted Python package collection" would compare. I can type "sudo apt install python-" and add the name of a package, and I get some assurance that: 1) The package works 2) The package is useful enough 3) It's not malware 4) The specific *version* of the package works along with the versions of everything else. This is a very strong set of criteria, much stronger than we'd be looking for here, as they come with correspondingly higher barriers to entry (getting a package update into the Debian repositories becomes harder and harder as the release date approaches).
conda-forge has about 22,121 -- that's enough to be very useful, but a lot of use-cases are not well covered, and I know I still have to contribute one once in a while.
Looking now -- PyPi has 465,295 projects more than 20 times as many -- I wonder how many of those are "useful"?
Contrariwise, the Debian repository has under a thousand "python-*" packages, but with a much stronger requirement that they be useful. It's interesting that there are only twenty on PyPI for every one on conda-forge. I would have expected a greater ratio. It seems that conda-forge is able to be incomplete AND dauntingly large; how successful would you be at guessing a package name based on a desired goal? ChrisA
On 2023-07-05 07:13, Chris Angelico wrote:
Right; hence the question of how a "vetted Python package collection" would compare. I can type "sudo apt install python-" and add the name of a package, and I get some assurance that:
1) The package works 2) The package is useful enough 3) It's not malware 4) The specific*version* of the package works along with the versions of everything else.
In my experience this is how conda-forge is too. The level of assurance is somewhat lower, but there is still a level of assurance about all those things. For point 4, the assurance is about the version you install working with the conda environment you install it into. This is an advantage over systemwide installs like debian packages because it means you can have multiple environments and know each one is consistent. Most of the problems arise when you circumvent conda's consistency checking, for instance by installing a package with pip rather than with conda. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On 2023-07-05 00:00, Christopher Barker wrote:
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
I noticed this in the discussion and I think it's an important difference in how people approach this question. Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose". I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions. It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk. Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators. Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons. First, it's easier. Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
It is possible, that issues being discussed at this stage are not as relevant as they seem at stage 0, which this idea is at. (Unless someone here is looking for a very serious commitment.) If some sort of starting point which is “light” in approach was decided on, then the process can be readjusted as/if it progresses. Maybe no need to put a “stamp” on a package, but simply provide comparison statistics given some initial structure. I think a lot of packages can be filtered on objective criteria, without even reaching the stage of subjective opinions. ——————————————— General info - fairly easy to inspect without the need of subjective opinions. 1. License 2. Maintenance - hard stack overflow & repo stats Performance - hard stats: 1. There will be lower level language extensions, which even if not up to standards in other aspects are worth attention, someone else might pick it up and rejuvenate if explicitly indicated. 2. There will be a pure python packages: a) good coding standards with good knowledge on efficient programming in pure python b) pure python packages that take ages to execute In many areas, this will filter out many libraries. Although, there are some, where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks can be quite tight. The remaining evaluation can be subjective opinions, where preferences of curators can be taken into account: 1. Coding standards 2. Integration 3. Flexibility/functionality 4. … IMO, all of this can be done while being on the safe side - if unsure, leave the plain statistics for users and developers to see. ——————————————— An example. (I am not the developer of any of these) Json serialisers: 1. json - stdlib, average performance, well maintained, flexible, very safe to depend on 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in replacement for json, been around for a while, safe to depend on 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), drop-in replacement for json, been around for a while, safe to depend on 3. ijson - 3rd party, C&python, average performance, proprietary interface relying heavily on iterator protocol, status <TBC> 4. orjson - 3rd party, highly optimised C, performance on par with fastest serialisers on the market, not-a-drop-in-replacement for json, due to sacrifices for performance, rich in functionality, well maintained, safe to depend on 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in replacement for json, extends json to json5 features such as comments, well maintained, safe to depend on (THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON) So there is still a bit of opinion here, but all of this can be standardised and put in numbers, and comparison of this type can be done with little-to-none personal opinion. ——————————————— After structure for this is in place, it would be easier to discuss further whether more serious curation is needed/worthwhile/makes sense. Allow queries from users, package developers, places to gather opinions, maybe volunteering to do a deeper analysis… And once there is enough input, maybe a curated guidance can be added to the review. But this is the next stage, which is not necessarily needed to be thoroughly thought out before putting in place something simple, objective & risk-free. ——————————————— Maybe stage 1. is all that users need - a reliable place to check hard stats, where users and developers can update them for the benefit of all. With enough popularity, package developers should be motivated to issue stat updates (e.g. add additional column to benchmarking script), and users would issue similar updates (e.g. add additional column to benchmarking script, where the library is extremely slow). It is possible that the project would naturally turn to direction of hard stat coverage instead of “deep” curation. E.g. json serialisers become a sub-branch of schema-less serialisers, which in turn become a branch of serialisers Then the user can then view comparable stats of the whole branch, sub-branch, sub-sub-branch to get the information he needs to make decisions. And apply different filters in the process to get to the final list of packages on which the user will have to do hiss final subjective analysis anyways. ——————————————— E.g. User needs a serialiser. He prefers schema-less, but willing to go schema-based given large increases in performance. Does not mind low maintenance status given he aims to maintain his own proprietary serialisation library in the long run. Naturally, clean & simple coding with permissive license is preferred. Just a portal with up-to-date stats where user could interactively navigate such decisions would be a good start and potentially a “safe” route to begin with. The starting work on such thing then would be more heavy on automation, rather than politics, which in turn will be easier to tackle later once there is something more tangible to discuss.
On 5 Jul 2023, at 21:34, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2023-07-05 00:00, Christopher Barker wrote:
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
I noticed this in the discussion and I think it's an important difference in how people approach this question. Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose".
I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions. It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk.
Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators.
Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons. First, it's easier. Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTR... Code of Conduct: http://python.org/psf/codeofconduct/
Dom: I'd recommend you simply start a GitHub project for "Curated PyPI", find a catchy domain name, and publish that via GH Pages. That's a few hours of work to get a skeleton. But no, I'm not quite volunteering to create and maintain it myself today. After there is a concrete site existing, you can refine the presentation and governance procedure iteratively. As a start, it can basically just be a web page with evaluations like yours of the JSON libraries. At a first pass, there's no need for anything dynamic on the page, just some tables (or maybe accordions, or side-bar navigation, or whatever). I'd be very likely to make some PRs to such a repository myself. At some point, with enough recommendations, you might add some automation. E.g. some script that checks all the submitted "package reviews" and creates an aggregation ("10 reviews with average rating of 8"). Even there, running that thing offline every once in a while is plenty to start (you could do GH Actions or something too, if you like). There are a few decisions to make, but none that difficult. For example, what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? Whatever it is, it should have a gentle learning curve and be human readable IMO. On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis <dom.grigonis@gmail.com> wrote:
It is possible, that issues being discussed at this stage are not as relevant as they seem at stage 0, which this idea is at. (Unless someone here is looking for a very serious commitment.)
If some sort of starting point which is “light” in approach was decided on, then the process can be readjusted as/if it progresses. Maybe no need to put a “stamp” on a package, but simply provide comparison statistics given some initial structure.
I think a lot of packages can be filtered on objective criteria, without even reaching the stage of subjective opinions.
———————————————
General info - fairly easy to inspect without the need of subjective opinions. 1. License 2. Maintenance - hard stack overflow & repo stats
Performance - hard stats: 1. There will be lower level language extensions, which even if not up to standards in other aspects are worth attention, someone else might pick it up and rejuvenate if explicitly indicated. 2. There will be a pure python packages: a) good coding standards with good knowledge on efficient programming in pure python b) pure python packages that take ages to execute
In many areas, this will filter out many libraries. Although, there are some, where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks can be quite tight.
The remaining evaluation can be subjective opinions, where preferences of curators can be taken into account: 1. Coding standards 2. Integration 3. Flexibility/functionality 4. …
IMO, all of this can be done while being on the safe side - if unsure, leave the plain statistics for users and developers to see.
———————————————
An example. (I am not the developer of any of these) Json serialisers: 1. json - stdlib, average performance, well maintained, flexible, very safe to depend on 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in replacement for json, been around for a while, safe to depend on 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), drop-in replacement for json, been around for a while, safe to depend on 3. ijson - 3rd party, C&python, average performance, proprietary interface relying heavily on iterator protocol, status <TBC> 4. orjson - 3rd party, highly optimised C, performance on par with fastest serialisers on the market, not-a-drop-in-replacement for json, due to sacrifices for performance, rich in functionality, well maintained, safe to depend on 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in replacement for json, extends json to json5 features such as comments, well maintained, safe to depend on
(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)
So there is still a bit of opinion here, but all of this can be standardised and put in numbers, and comparison of this type can be done with little-to-none personal opinion.
———————————————
After structure for this is in place, it would be easier to discuss further whether more serious curation is needed/worthwhile/makes sense.
Allow queries from users, package developers, places to gather opinions, maybe volunteering to do a deeper analysis…
And once there is enough input, maybe a curated guidance can be added to the review. But this is the next stage, which is not necessarily needed to be thoroughly thought out before putting in place something simple, objective & risk-free.
———————————————
Maybe stage 1. is all that users need - a reliable place to check hard stats, where users and developers can update them for the benefit of all. With enough popularity, package developers should be motivated to issue stat updates (e.g. add additional column to benchmarking script), and users would issue similar updates (e.g. add additional column to benchmarking script, where the library is extremely slow).
It is possible that the project would naturally turn to direction of hard stat coverage instead of “deep” curation. E.g. json serialisers become a sub-branch of schema-less serialisers, which in turn become a branch of serialisers
Then the user can then view comparable stats of the whole branch, sub-branch, sub-sub-branch to get the information he needs to make decisions. And apply different filters in the process to get to the final list of packages on which the user will have to do hiss final subjective analysis anyways.
———————————————
E.g. User needs a serialiser. He prefers schema-less, but willing to go schema-based given large increases in performance. Does not mind low maintenance status given he aims to maintain his own proprietary serialisation library in the long run. Naturally, clean & simple coding with permissive license is preferred.
Just a portal with up-to-date stats where user could interactively navigate such decisions would be a good start and potentially a “safe” route to begin with.
The starting work on such thing then would be more heavy on automation, rather than politics, which in turn will be easier to tackle later once there is something more tangible to discuss.
On 5 Jul 2023, at 21:34, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these
On 2023-07-05 00:00, Christopher Barker wrote: packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
I noticed this in the discussion and I think it's an important difference in how people approach this question. Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose".
I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions. It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk.
Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators.
Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons. First, it's easier. Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTR... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
Thanks David, Not keen to take it on solo. Ideally, IMO, this could be a joint project of this whole group. Someone more senior from this group creates a repo and oversights the progress, couple of more experienced members make initial decisions to get things going, someone issues a review PR, couple of new OPs are suggested to write a review on their queries. Would be good to know if: 1. Someone has their own little benchmarking/reviews and would be willing to spend a little time to issue PR for some initial content. 2. People from this group see themselves of going to such place and adding a new great package they have just found to existing reviews (or even more importantly an awful package) 3. Someone sees the opportunity to contribute given someone took such project on. e.g. a) someone is very excited about benchmarking automation b) someone has some working scripts to fetch github stats / stack trends that are waiting to be used c) someone wants to take their devops to next level and sees this as a good opportunity d) someone is very keen on high level view and would like to contribute in working on categorisation (partially relying on python stdlib/libref could be intuitive, although then it is a dependency) A lot of “someones” in this e-mail...
On 6 Jul 2023, at 16:55, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Dom:
I'd recommend you simply start a GitHub project for "Curated PyPI", find a catchy domain name, and publish that via GH Pages. That's a few hours of work to get a skeleton. But no, I'm not quite volunteering to create and maintain it myself today.
After there is a concrete site existing, you can refine the presentation and governance procedure iteratively. As a start, it can basically just be a web page with evaluations like yours of the JSON libraries. At a first pass, there's no need for anything dynamic on the page, just some tables (or maybe accordions, or side-bar navigation, or whatever).
I'd be very likely to make some PRs to such a repository myself. At some point, with enough recommendations, you might add some automation. E.g. some script that checks all the submitted "package reviews" and creates an aggregation ("10 reviews with average rating of 8"). Even there, running that thing offline every once in a while is plenty to start (you could do GH Actions or something too, if you like).
There are a few decisions to make, but none that difficult. For example, what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? Whatever it is, it should have a gentle learning curve and be human readable IMO.
On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis <dom.grigonis@gmail.com <mailto:dom.grigonis@gmail.com>> wrote: It is possible, that issues being discussed at this stage are not as relevant as they seem at stage 0, which this idea is at. (Unless someone here is looking for a very serious commitment.)
If some sort of starting point which is “light” in approach was decided on, then the process can be readjusted as/if it progresses. Maybe no need to put a “stamp” on a package, but simply provide comparison statistics given some initial structure.
I think a lot of packages can be filtered on objective criteria, without even reaching the stage of subjective opinions.
———————————————
General info - fairly easy to inspect without the need of subjective opinions. 1. License 2. Maintenance - hard stack overflow & repo stats
Performance - hard stats: 1. There will be lower level language extensions, which even if not up to standards in other aspects are worth attention, someone else might pick it up and rejuvenate if explicitly indicated. 2. There will be a pure python packages: a) good coding standards with good knowledge on efficient programming in pure python b) pure python packages that take ages to execute
In many areas, this will filter out many libraries. Although, there are some, where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks can be quite tight.
The remaining evaluation can be subjective opinions, where preferences of curators can be taken into account: 1. Coding standards 2. Integration 3. Flexibility/functionality 4. …
IMO, all of this can be done while being on the safe side - if unsure, leave the plain statistics for users and developers to see.
———————————————
An example. (I am not the developer of any of these) Json serialisers: 1. json - stdlib, average performance, well maintained, flexible, very safe to depend on 2. simplejson - 3rd party, pure python, performance in line with 1), drop-in replacement for json, been around for a while, safe to depend on 2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), drop-in replacement for json, been around for a while, safe to depend on 3. ijson - 3rd party, C&python, average performance, proprietary interface relying heavily on iterator protocol, status <TBC> 4. orjson - 3rd party, highly optimised C, performance on par with fastest serialisers on the market, not-a-drop-in-replacement for json, due to sacrifices for performance, rich in functionality, well maintained, safe to depend on 5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in replacement for json, extends json to json5 features such as comments, well maintained, safe to depend on
(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)
So there is still a bit of opinion here, but all of this can be standardised and put in numbers, and comparison of this type can be done with little-to-none personal opinion.
———————————————
After structure for this is in place, it would be easier to discuss further whether more serious curation is needed/worthwhile/makes sense.
Allow queries from users, package developers, places to gather opinions, maybe volunteering to do a deeper analysis…
And once there is enough input, maybe a curated guidance can be added to the review. But this is the next stage, which is not necessarily needed to be thoroughly thought out before putting in place something simple, objective & risk-free.
———————————————
Maybe stage 1. is all that users need - a reliable place to check hard stats, where users and developers can update them for the benefit of all. With enough popularity, package developers should be motivated to issue stat updates (e.g. add additional column to benchmarking script), and users would issue similar updates (e.g. add additional column to benchmarking script, where the library is extremely slow).
It is possible that the project would naturally turn to direction of hard stat coverage instead of “deep” curation. E.g. json serialisers become a sub-branch of schema-less serialisers, which in turn become a branch of serialisers
Then the user can then view comparable stats of the whole branch, sub-branch, sub-sub-branch to get the information he needs to make decisions. And apply different filters in the process to get to the final list of packages on which the user will have to do hiss final subjective analysis anyways.
———————————————
E.g. User needs a serialiser. He prefers schema-less, but willing to go schema-based given large increases in performance. Does not mind low maintenance status given he aims to maintain his own proprietary serialisation library in the long run. Naturally, clean & simple coding with permissive license is preferred.
Just a portal with up-to-date stats where user could interactively navigate such decisions would be a good start and potentially a “safe” route to begin with.
The starting work on such thing then would be more heavy on automation, rather than politics, which in turn will be easier to tackle later once there is something more tangible to discuss.
On 5 Jul 2023, at 21:34, Brendan Barnwell <brenbarn@brenbarn.net <mailto:brenbarn@brenbarn.net>> wrote:
On 2023-07-05 00:00, Christopher Barker wrote:
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
I noticed this in the discussion and I think it's an important difference in how people approach this question. Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose".
I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions. It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk.
Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators.
Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons. First, it's easier. Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTR... <https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTRKHZ2T4Z3VHQUCC5L7OATSHPUQU/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ... <https://mail.python.org/archives/list/python-ideas@python.org/message/57RIWZ72DZXZYPNJV4PFROVMLSFJB6KO/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
To have 4) would be great. 2) could be a temporary test pilot to get some ideas when/if 4) was to be implemented. Starting point for 2) could be simple (?). E.g. 4 parts: 1. Features & integration 2. Performance - set-up automatic benchmarking via CI 3. Status (stack hits, user base, open issues, lines of code, update frequency & similar) 4. User rating (wander if up/down voting or star rating can be somehow implemented in github) Author collects inputs from this e-mail group or via opened issue and collates results. Once the comparison tables are there package authors can then issue PR themselves or next interested user does the update. Could be a simple way to gauge variables before committing to 3) or 4).
On 5 Jul 2023, at 03:21, Christopher Barker <pythonchb@gmail.com> wrote:
Stating a new thread with a correct title.
On 2 Jul 2023, at 10:12, Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:
Unfortunately, too much of this discussion is framed as “someone should”, or “it would be good if”. No-one is saying “I will”. Naming groups, like “the PyPA should” doesn’t help either - groups don’t do things, people do. Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d *use* a curated index, I sure as heck couldn’t *create* one.
Well, I started this topic, and I don't *think* I ever wrote "someone should", and I certainly didn't write "PyPa should".
But whatever I or anyone else wrote, my intention was to discuss what might be done to address what I think is a real problem/limitation in the discoverability of useful packages for Python.
And I think of it not so much as "someone should" but as "it would be nice to have".
Of course, any of these ideas would take a lot of work to implement -- and even though there are a lot of folks, on this list and elsewhere, that would like to help, I don't think any substantial open-source project has gotten anywhere without a concerted effort by a very small group (often just 1) of people doing a lot of work to get it to a useful state before a larger group can contribute. So I"m fully aware that nothings going to happen unless *someone* really puts the work in up front. That someone *might* be me, but I'm really good at over-committing myself, and not so great at keeping my nose to the grindstone, so ....
And I think this particular problem calls for a solution that would have to be pretty well established before reaching critical mass to actually be useful -- after all, we already have PyPi -- why go anywhere else that is less comprehensive?
All that being said, it's still worth having a conversation about what a good solution might look like -- there are a lot of options, and hashing out some of the ideas might inspire someone to rise to the occasion.
The :problem", as I see it.
- The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.
So the odds that there's a package that does what you need are good, but it can be pretty hard to find them sometimes -- and can be a fair bit of work to sift through to find the good ones -- and many folks don't feel qualified to do so.
This can result in two opposite consequences:
1) People using a package that really isn't reliable or maintained (or not supported on all platforms, or ..) and getting stuck with it (I've had that on some of my projects -- I'm pretty careful, but not everyone on my team is)
2) People writing their own code - wasting time, and maybe not getting a very good solution either. I've also had that problem on my projects...
To avoid this -- SOME way for folks to find packages that have at least has some level of vetting would be good -- exactly what level of vetting, is a very open question, but I think "even a little" could be very helpful.
A few ideas that have come up in the previous thread -- more or less in order of level of effort.
1) A "clearing house" of package reviews, for lack of a better word -- a single web site that would point to various resources on the internet -- blog posts, etc, that would help people select packages. So one might go to the "JSON: section, and find links to some of the comparisons of JSON parsers, to help you choose. The authors of this site could even solicit folks to write reviews, etc that they could then point to. Chris A: this is my interpretation of your decentralized idea -- please do correct me if I got it wrong.
2) A Python package review site. This could be setup and managed with, e.g. a gitHub repo, so that there could be a small number of editors, but anyone could contribute a review via PR. The ultimiate goal would be reviews/recommendations of all the major package categories on PyPi. If well structured and searchable, this could be very useful. - This was proposed by someone on the previous thread -- again, correct me if I'm wrong.
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Personally, I think (4) is the best end result, but probably the most work as well[*], so ???
(1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
And of course, if anyone wants to join forces on getting something going.
-CHB
[*] maybe not, actually -- once the infrastructure was in place, adding a package would require someone requesting that it be added, and someone on a "core team" approving it. That could be a pretty low level of effort, actually. The actual mechanism would be to simply copy it from PyPi once approved -- not that hard to automate. hmmm....
Probably the biggest challenge would be coming up with the criteria for approval -- not an easy question. And it would require substantial critical mass to be at all useful.
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XFJL36... Code of Conduct: http://python.org/psf/codeofconduct/
I like the idea of a vetted package index that pip can point to. The more I think about it, the more I think that it needs some sort of peer review system as the barrier to entry, and my thoughts to to establishing some DeSci DAO that could distribute the peer review of packages amongst a set of trusted maintainers while also being a mechanism to add new trusted maintainers to the peer-review pool. Peer reviewers could be funded via a fee to submit a package for publishing. There are a lot of open questions of how this would be done correctly - or even if this is necessary - but I think to achieve a scalable, funded, decentralized, and trustworthy package index a DAO makes some amount of sense. On Tue, Jul 4, 2023 at 8:52 PM Dom Grigonis <dom.grigonis@gmail.com> wrote:
To have 4) would be great. 2) could be a temporary test pilot to get some ideas when/if 4) was to be implemented.
Starting point for 2) could be simple (?). E.g. 4 parts: 1. Features & integration 2. Performance - set-up automatic benchmarking via CI 3. Status (stack hits, user base, open issues, lines of code, update frequency & similar) 4. User rating (wander if up/down voting or star rating can be somehow implemented in github)
Author collects inputs from this e-mail group or via opened issue and collates results. Once the comparison tables are there package authors can then issue PR themselves or next interested user does the update.
Could be a simple way to gauge variables before committing to 3) or 4).
On 5 Jul 2023, at 03:21, Christopher Barker <pythonchb@gmail.com> wrote:
Stating a new thread with a correct title.
On 2 Jul 2023, at 10:12, Paul Moore <p.f.moore@gmail.com> wrote:
Unfortunately, too much of this discussion is framed as “someone should”,
or “it would be good if”. No-one is saying “I will”. Naming groups, like “the PyPA should” doesn’t help either - groups don’t do things, people do. Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d *use* a curated index, I sure as heck couldn’t *create* one.
Well, I started this topic, and I don't *think* I ever wrote "someone should", and I certainly didn't write "PyPa should".
But whatever I or anyone else wrote, my intention was to discuss what might be done to address what I think is a real problem/limitation in the discoverability of useful packages for Python.
And I think of it not so much as "someone should" but as "it would be nice to have".
Of course, any of these ideas would take a lot of work to implement -- and even though there are a lot of folks, on this list and elsewhere, that would like to help, I don't think any substantial open-source project has gotten anywhere without a concerted effort by a very small group (often just 1) of people doing a lot of work to get it to a useful state before a larger group can contribute. So I"m fully aware that nothings going to happen unless *someone* really puts the work in up front. That someone *might* be me, but I'm really good at over-committing myself, and not so great at keeping my nose to the grindstone, so ....
And I think this particular problem calls for a solution that would have to be pretty well established before reaching critical mass to actually be useful -- after all, we already have PyPi -- why go anywhere else that is less comprehensive?
All that being said, it's still worth having a conversation about what a good solution might look like -- there are a lot of options, and hashing out some of the ideas might inspire someone to rise to the occasion.
The :problem", as I see it.
- The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.
So the odds that there's a package that does what you need are good, but it can be pretty hard to find them sometimes -- and can be a fair bit of work to sift through to find the good ones -- and many folks don't feel qualified to do so.
This can result in two opposite consequences:
1) People using a package that really isn't reliable or maintained (or not supported on all platforms, or ..) and getting stuck with it (I've had that on some of my projects -- I'm pretty careful, but not everyone on my team is)
2) People writing their own code - wasting time, and maybe not getting a very good solution either. I've also had that problem on my projects...
To avoid this -- SOME way for folks to find packages that have at least has some level of vetting would be good -- exactly what level of vetting, is a very open question, but I think "even a little" could be very helpful.
A few ideas that have come up in the previous thread -- more or less in order of level of effort.
1) A "clearing house" of package reviews, for lack of a better word -- a single web site that would point to various resources on the internet -- blog posts, etc, that would help people select packages. So one might go to the "JSON: section, and find links to some of the comparisons of JSON parsers, to help you choose. The authors of this site could even solicit folks to write reviews, etc that they could then point to. Chris A: this is my interpretation of your decentralized idea -- please do correct me if I got it wrong.
2) A Python package review site. This could be setup and managed with, e.g. a gitHub repo, so that there could be a small number of editors, but anyone could contribute a review via PR. The ultimiate goal would be reviews/recommendations of all the major package categories on PyPi. If well structured and searchable, this could be very useful. - This was proposed by someone on the previous thread -- again, correct me if I'm wrong.
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Personally, I think (4) is the best end result, but probably the most work as well[*], so ???
(1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
And of course, if anyone wants to join forces on getting something going.
-CHB
[*] maybe not, actually -- once the infrastructure was in place, adding a package would require someone requesting that it be added, and someone on a "core team" approving it. That could be a pretty low level of effort, actually. The actual mechanism would be to simply copy it from PyPi once approved -- not that hard to automate. hmmm....
Probably the biggest challenge would be coming up with the criteria for approval -- not an easy question. And it would require substantial critical mass to be at all useful.
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XFJL36... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7LANRT... Code of Conduct: http://python.org/psf/codeofconduct/
-- -Dr. Jon Crall (him)
On Wed, 5 Jul 2023 at 15:07, Jonathan Crall <erotemic@gmail.com> wrote:
I like the idea of a vetted package index that pip can point to. The more I think about it, the more I think that it needs some sort of peer review system as the barrier to entry, and my thoughts to to establishing some DeSci DAO that could distribute the peer review of packages amongst a set of trusted maintainers while also being a mechanism to add new trusted maintainers to the peer-review pool. Peer reviewers could be funded via a fee to submit a package for publishing. There are a lot of open questions of how this would be done correctly - or even if this is necessary - but I think to achieve a scalable, funded, decentralized, and trustworthy package index a DAO makes some amount of sense.
This adds up to a HUGE barrier to entry for new packages. For your proposal to be successful, a good number of users would need to point pip to the vetted index and NOT to the normal one (otherwise there's no benefit - you're just getting non-vetted packages), and in order for a new package to become relevant, it needs to: 1. Pay money. Even if it's not a huge dollar amount, that is already a very significant barrier for casual users. 2. Get reviewed. This is going to take time. 3. Pass the review. If the review system is at all meaningful, it has to knock some packages back, otherwise all you have is "pay to be visible" which is a horrible system. 4. OR, instead of those steps: Convince your users to switch to the "untrusted" package repository. That makes for a very closed-off ecosystem, where an incumbent has a dramatic advantage over anything that comes along. It would almost certainly require that vetted packages not depend on non-vetted packages (otherwise you'd need some bizarre mechanic whereby "pip install package-name" looks at one repository, but it resolves dependencies by looking at a different one), so nothing would have a chance to be seen in the walled garden until you get it vetted AND recognized. So, sure, this would make life easier for those who want to randomly download packages without thinking about them, but at the cost of making the package repository extremely minimalist and insular. It'd make the Python packaging ecosystem look as unfriendly as iPhone app publishing. ChrisA
On Tue, Jul 4, 2023 at 10:06 PM Jonathan Crall <erotemic@gmail.com> wrote:
I like the idea of a vetted package index that pip can point to. The more I think about it, the more I think that it needs some sort of peer review system as the barrier to entry, and my thoughts to to establishing some DeSci DAO that could distribute the peer review of packages amongst a set of trusted maintainers while also being a mechanism to add new trusted maintainers to the peer-review pool.
Peer reviewers could be funded via a fee to submit a package for publishing.
I agree until this point -- we REALLY don't want to have a pay to play system. Yes, it needs to be funded somehow, but some sort of donation / non profit / etc funding mechanism would be best -- but I don't think peer reviewers should be paid. Peer review in academic journals isn't cash compensated either.
I think to achieve a scalable, funded, decentralized, and trustworthy package index a DAO makes some amount of sense.
I had to look that up: "Decentralized autonomous organization (DAO)" So, yes. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Christopher Barker writes:
Yes, it needs to be funded somehow, but some sort of donation / non profit / etc funding mechanism would be best -- but I don't think peer reviewers should be paid. Peer review in academic journals isn't cash compensated either.
It's been done. The most common scheme is nominal compensation (say USD50 per review) dependent on beating a relatively short deadline (typically 1-3 months). But this is not really the same as academic publishing. It's also not the same as movie and book reviewers who are paid staffers (at least they used to be in the days of paper journals). It has aspects of both. It might work here, although funding and appointment of reviewers are tough issues.
I had to look that up: "Decentralized autonomous organization (DAO)"
So, yes.
Please, no. DAOs are fine when only money is at risk (too risky for me, though). But they're a terrible way to manage a community or its money. Too fragile, too inflexible. The history of DAOs is basically an empirical confirmation of Arrow's Impossibility Theorem. https://simple.wikipedia.org/wiki/Arrow%27s_impossibility_theorem
I would go a bit further: DAOs are absolutely terrible for EVERYTHING, and anything that remotely mentions the acronym is a scam. Let's please, please, please not go down some cryptoscam, blockchain, rabbit hole here. Drop it, burn the remains, try to forget it ever happened. On Wed, Jul 5, 2023 at 3:57 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Christopher Barker writes:
Yes, it needs to be funded somehow, but some sort of donation / non profit / etc funding mechanism would be best -- but I don't think peer reviewers should be paid. Peer review in academic journals isn't cash compensated either.
It's been done. The most common scheme is nominal compensation (say USD50 per review) dependent on beating a relatively short deadline (typically 1-3 months). But this is not really the same as academic publishing. It's also not the same as movie and book reviewers who are paid staffers (at least they used to be in the days of paper journals). It has aspects of both. It might work here, although funding and appointment of reviewers are tough issues.
I had to look that up: "Decentralized autonomous organization (DAO)"
So, yes.
Please, no. DAOs are fine when only money is at risk (too risky for me, though). But they're a terrible way to manage a community or its money. Too fragile, too inflexible. The history of DAOs is basically an empirical confirmation of Arrow's Impossibility Theorem. https://simple.wikipedia.org/wiki/Arrow%27s_impossibility_theorem
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZLBEHQ... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
Christopher Barker writes:
So the odds that there's a package that does what you need are good, but it can be pretty hard to find them sometimes -- and can be a fair bit of work to sift through to find the good ones -- and many folks don't feel qualified to do so.
"Fair bit of work sifting" vs. "very hard work writing your own of higher quality" sounds like a VeryGoodDeal[tm] to me. As for "unqualified", if it's OK to be writing a program where you're unqualified to evaluate dependency packages, maybe it really doesn't matter if the package is best of breed? There are lots of programs where it doesn't matter, obviously. But if there's problem, it won't be solved by curation -- even best of breed is liable to be misused.
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Why "self-contained"? I always enter PyPI through the top page. I'd just substitute curated-pypi.org's top page. Its search results would be restricted to (or prioritize) the curated set, but it would take me to the PyPI page of the recommended package. Why duplicate those pages? Would you also duplicate the storage? The only reason I can imagine for doing a "deep copy" would be to avoid the accusation of piggybacking on PyPI's and PyPA's hard work. Even if maintaining separate storage, many developers who use it would still depend on pip -- which gives priority to PyPI. They'd use the curated site to choose a package, but then just write its name in requirements.txt -- and of course that would work. So you don't even really avoid depending on PyPI for bandwidth (of course PyPI is going to spend on storage, anyway, so that doesn't count).
Personally, I think (4) is the best end result, but probably the most work as well[*], so ???
Setup is pretty straightforward, maybe expensive if you go the self-contained route. But all the extra ongoing work is curation. And that's *really* hard to sustain. Riffing on "curated-pypi.org", I think it would be difficult to get "curated DOT pypi.org" for the reasons that have been repeatedly advanced, but why not your_team_name_here.curated.pypi.org? Set up a Patreon and get the TechDirt guy or somebody like that to write a monthly column reviewing the curators (once you get a lot, call it "featured curators", probably with the excuse that they specialize in some application field) or soliciting curators as the curated.pypi.org page. Steve
On Wed, 5 Jul 2023 at 17:12, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Why "self-contained"? I always enter PyPI through the top page. I'd just substitute curated-pypi.org's top page. Its search results would be restricted to (or prioritize) the curated set, but it would take me to the PyPI page of the recommended package.
Part of the desired protection is the prevention of typosquatting. That means there has to be something that you can point pip to and say "install this package", and it's unable to install any non-curated package. There are many protections against typosquatting (and malware installation in general), but this particular one can be very effective, albeit with some fairly significant costs. ChrisA
Chris Angelico writes:
Part of the desired protection is the prevention of typosquatting. That means there has to be something that you can point pip to and say "install this package", and it's unable to install any non-curated package.
I think that the goalposts are walking though. How do you keep non-curated packages out of requirements.txt? Only if you have a closed ecosystem. Sounds like Anaconda or Condaforge or Debian to me, and people who want such a closed system should pick one-- and preferably only one --to support. The basic request as I understood it was to reduce what Chris Barker characterized as the cost of sifting through a maze of twisty little packages all alike, except that some are good, and some are bad, and some are downright ugly. Part of that is indeed to avoid typo- squatting malware. However, most of the squatters I'm aware of use names that look like improved or updated versions, and would not be frequently typoed. So my "click through to PyPI" approach would filter a majority, possibly a large majority, of non-curated packages. If people really want this somewhat draconian restriction to curated packages, fine by me (I'll stick to proofreading requirements.txt very carefully plus pip'ing from PyPI myself). I just don't see how it works or has advantages over existing options. Steve
On Wed, Jul 5, 2023, 01:24 Christopher Barker <pythonchb@gmail.com> wrote:
Stating a new thread with a correct title.
On 2 Jul 2023, at 10:12, Paul Moore <p.f.moore@gmail.com> wrote:
Unfortunately, too much of this discussion is framed as “someone should”,
or “it would be good if”. No-one is saying “I will”. Naming groups, like “the PyPA should” doesn’t help either - groups don’t do things, people do. Who in the PyPA? Me? Nope, sorry, I don’t have the time or interest - I’d *use* a curated index, I sure as heck couldn’t *create* one.
Well, I started this topic, and I don't *think* I ever wrote "someone should", and I certainly didn't write "PyPa should".
But whatever I or anyone else wrote, my intention was to discuss what might be done to address what I think is a real problem/limitation in the discoverability of useful packages for Python.
And I think of it not so much as "someone should" but as "it would be nice to have".
Of course, any of these ideas would take a lot of work to implement -- and even though there are a lot of folks, on this list and elsewhere, that would like to help, I don't think any substantial open-source project has gotten anywhere without a concerted effort by a very small group (often just 1) of people doing a lot of work to get it to a useful state before a larger group can contribute. So I"m fully aware that nothings going to happen unless *someone* really puts the work in up front. That someone *might* be me, but I'm really good at over-committing myself, and not so great at keeping my nose to the grindstone, so ....
And I think this particular problem calls for a solution that would have to be pretty well established before reaching critical mass to actually be useful -- after all, we already have PyPi -- why go anywhere else that is less comprehensive?
All that being said, it's still worth having a conversation about what a good solution might look like -- there are a lot of options, and hashing out some of the ideas might inspire someone to rise to the occasion.
The :problem", as I see it.
- The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages. - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.
So the odds that there's a package that does what you need are good, but it can be pretty hard to find them sometimes -- and can be a fair bit of work to sift through to find the good ones -- and many folks don't feel qualified to do so.
This can result in two opposite consequences:
1) People using a package that really isn't reliable or maintained (or not supported on all platforms, or ..) and getting stuck with it (I've had that on some of my projects -- I'm pretty careful, but not everyone on my team is)
2) People writing their own code - wasting time, and maybe not getting a very good solution either. I've also had that problem on my projects...
To avoid this -- SOME way for folks to find packages that have at least has some level of vetting would be good -- exactly what level of vetting, is a very open question, but I think "even a little" could be very helpful.
Doesn't each individual / team / company / organization already discuss and document their preferred packages, and (indirectly or directly) help to evolve them and consider alternatives by doing so? I think it's important that there's a common space where packages can exist and be available. Whether that realtime environment should state its' own preferred packages seems more debatable to me - because it could become a source of contention and gaming similar to search engine optimization. An exception could be software museums: I can see a curated collection of best-known and most-effective libraries used by particular cultures at points-in-time being of interest to future (and some current?) generations. I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship. A few ideas that have come up in the previous thread -- more or less in
order of level of effort.
1) A "clearing house" of package reviews, for lack of a better word -- a single web site that would point to various resources on the internet -- blog posts, etc, that would help people select packages. So one might go to the "JSON: section, and find links to some of the comparisons of JSON parsers, to help you choose. The authors of this site could even solicit folks to write reviews, etc that they could then point to. Chris A: this is my interpretation of your decentralized idea -- please do correct me if I got it wrong.
2) A Python package review site. This could be setup and managed with, e.g. a gitHub repo, so that there could be a small number of editors, but anyone could contribute a review via PR. The ultimiate goal would be reviews/recommendations of all the major package categories on PyPi. If well structured and searchable, this could be very useful. - This was proposed by someone on the previous thread -- again, correct me if I'm wrong.
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
4) A self contained repository of packages that you could point pip to -- it would contain only the packages that had met some sort of "vetting" criteria. In theory, anyone could run it, but a stamp of approval from the PSF would make it far more acceptable to people. This would be a LOT of work to get set up, and still a lot of work to maintain.
Personally, I think (4) is the best end result, but probably the most work as well[*], so ???
(1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
And of course, if anyone wants to join forces on getting something going.
-CHB
[*] maybe not, actually -- once the infrastructure was in place, adding a package would require someone requesting that it be added, and someone on a "core team" approving it. That could be a pretty low level of effort, actually. The actual mechanism would be to simply copy it from PyPi once approved -- not that hard to automate. hmmm....
Probably the biggest challenge would be coming up with the criteria for approval -- not an easy question. And it would require substantial critical mass to be at all useful.
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XFJL36... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index. ChrisA
Why not just use gpg signatures and maintain trusted signing keys? There’s no reason to reinvent the wheel. If a user wants to use a unsigned or untrusted packages, they have to accept the risk. Thanks, Greg On Wed, Jul 5, 2023 at 2:05 PM Chris Angelico <rosuav@gmail.com> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote: publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3H... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, 6 Jul 2023 at 04:08, Gregory Disney <gregory.disney.leugers@gmail.com> wrote:
Why not just use gpg signatures and maintain trusted signing keys? There’s no reason to reinvent the wheel. If a user wants to use a unsigned or untrusted packages, they have to accept the risk.
As an alternative to a blockchain? No idea, but I've never considered blockchains to be useful for anything more than toys anyway. As an alternative to a curated package list? That just comes down to who holds the trusted keys, so it's the same as the other suggestions, only you're looking at the mechanics for knowing whether it's on the list, as opposed to the mechanics for figuring out which things go on the list - two sides of the same coin, pretty much. ChrisA
On Wed, Jul 5, 2023, 19:06 Chris Angelico <rosuav@gmail.com> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote: publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3H... Code of Conduct: http://python.org/psf/codeofconduct/
Mostly agreed. A distributed hash table or similar, though, could be appropriate in combination with ideas similar to the accreting layers of self-reinforcing consensus that some blockchain technologies provide.
why do people insist on reinventing the wheel? Blockchain is not the answer for adding trust that is verifiable. Code signing is the answer, it’s widely accepted and would be useful in cases of trusted computing and other security use cases. I don’t want to load a hash table to load a third party module on a UEFI interface. On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas < python-ideas@python.org> wrote:
On Wed, Jul 5, 2023, 19:06 Chris Angelico <rosuav@gmail.com> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote: publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3H... Code of Conduct: http://python.org/psf/codeofconduct/
Mostly agreed. A distributed hash table or similar, though, could be appropriate in combination with ideas similar to the accreting layers of self-reinforcing consensus that some blockchain technologies provide. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZA... Code of Conduct: http://python.org/psf/codeofconduct/
I agree, we should encourage or await a single organization to reimplement a packaging ecosystem with a slightly different set of properties that continue to provide them with editor biasing, preventing eventual global consensus and system neutrality. Is your time available to help build it? On Thu, Jul 6, 2023, 14:17 Gregory Disney <gregory.disney.leugers@gmail.com> wrote:
why do people insist on reinventing the wheel? Blockchain is not the answer for adding trust that is verifiable. Code signing is the answer, it’s widely accepted and would be useful in cases of trusted computing and other security use cases.
I don’t want to load a hash table to load a third party module on a UEFI interface.
On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas < python-ideas@python.org> wrote:
On Wed, Jul 5, 2023, 19:06 Chris Angelico <rosuav@gmail.com> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote: publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3H... Code of Conduct: http://python.org/psf/codeofconduct/
Mostly agreed. A distributed hash table or similar, though, could be appropriate in combination with ideas similar to the accreting layers of self-reinforcing consensus that some blockchain technologies provide. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZA... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Jul 6, 2023, 14:22 James Addison <jay@jp-hosting.net> wrote:
I agree, we should encourage or await a single organization to reimplement a packaging ecosystem with a slightly different set of properties that continue to provide them with editor biasing, preventing eventual global consensus and system neutrality.
Is your time available to help build it?
I'd like to apologise for this comment; I don't think that I argued in good faith here. I was frustrated by a sense that many of the more straightforward attempts to make improvements in packaging ecosystems are, in themselves, a reinvention of previously-existing wheels, often producing similarly wonky spokes to previous attempts that result in repeated off-course journeys that, given enough knowledge of the history of technology, seem predictable. To observe that and then to go on to suggest that we simply wait for the next wonky wheel builder doesn't seem like genuine progress, and I should neither argue for that nor ask whether other people want to spend their own valuable time on it. On Thu, Jul 6, 2023, 14:17 Gregory Disney <gregory.disney.leugers@gmail.com>
wrote:
why do people insist on reinventing the wheel? Blockchain is not the answer for adding trust that is verifiable. Code signing is the answer, it’s widely accepted and would be useful in cases of trusted computing and other security use cases.
I don’t want to load a hash table to load a third party module on a UEFI interface.
On Thu, Jul 6, 2023 at 9:11 AM James Addison via Python-ideas < python-ideas@python.org> wrote:
On Wed, Jul 5, 2023, 19:06 Chris Angelico <rosuav@gmail.com> wrote:
I also agree with a later reply about avoiding the murkier side of blockchains / etc. That said, it seems to me (again, sample size one anecdata) that creating a more levelled playing field for package
On Thu, 6 Jul 2023 at 03:57, James Addison via Python-ideas <python-ideas@python.org> wrote: publication could benefit from the use of some distributed technologies. Even HTTP mirrors are, arguably, a basic form of that.. there's at least one question related to recency of data, though. Delaying availability of a package to an audience -- if it's important enough -- could under some circumstances become effectively similar to censorship.
A blockchain won't solve anything here. It would be completely and utterly impractical to put the packages themselves into a blockchain, so all you'd have is the index, and that means it's just a bad version of PyPI's own single-page index.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PTIS3H... Code of Conduct: http://python.org/psf/codeofconduct/
Mostly agreed. A distributed hash table or similar, though, could be appropriate in combination with ideas similar to the accreting layers of self-reinforcing consensus that some blockchain technologies provide. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I3CDZA... Code of Conduct: http://python.org/psf/codeofconduct/
On 2023-07-04 17:21, Christopher Barker wrote:
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
To my mind there are two interrelated social problems that make this task difficult: 1) Promulgating a list of "good" packages tends to make people think packages not on the list are not good (aka "implied exhaustiveness") 2) In order to curate all or nearly all packages, you need curators with a wide range of areas of interest and expertise (aka "breadth"). The reason these are interrelated is that once people start thinking your list is exhaustive, it's really important to have breadth in the curation, or else entire domains of utility can wind up having all packages implicitly proscribed. As an example, a few months ago I wanted to do some automated email manipulations via IMAP. I looked at the builtin imaplib module and found it useless, so I went looking for other things. I eventually found one that more or less met my needs (imap_tools). The question is, what happens when a person goes to our curated index looking for an IMAP library? If they don't find one, does that mean there aren't any, or there are but they're all junk, or just that there was no curator who had any reason to explore the space of packages available in this area? In short, it becomes difficult for a user to decide whether a tool's *absence" from the index indicates a negative opinion or no opinion. There are ways around this, like adding categories (so if you see a category you know someone at least attempted to evaluate some packages in that category), but they can also have their own problems (like increasing the level of work required for curation). I'm not sure what the best solution is, but just wanted to mention this issue. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
One way would be to categorise areas and sub-areas and have a clear indication, where the work has not been done. So that if I came to such place, I could find the sub-topic that I am interested in with clear indication of the status.
On 5 Jul 2023, at 21:48, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2023-07-04 17:21, Christopher Barker wrote:
Anyway, I'd love to hear your thoughts on these ideas (or others?) - both technical and social.
To my mind there are two interrelated social problems that make this task difficult:
1) Promulgating a list of "good" packages tends to make people think packages not on the list are not good (aka "implied exhaustiveness") 2) In order to curate all or nearly all packages, you need curators with a wide range of areas of interest and expertise (aka "breadth").
The reason these are interrelated is that once people start thinking your list is exhaustive, it's really important to have breadth in the curation, or else entire domains of utility can wind up having all packages implicitly proscribed.
As an example, a few months ago I wanted to do some automated email manipulations via IMAP. I looked at the builtin imaplib module and found it useless, so I went looking for other things. I eventually found one that more or less met my needs (imap_tools).
The question is, what happens when a person goes to our curated index looking for an IMAP library? If they don't find one, does that mean there aren't any, or there are but they're all junk, or just that there was no curator who had any reason to explore the space of packages available in this area? In short, it becomes difficult for a user to decide whether a tool's *absence" from the index indicates a negative opinion or no opinion.
There are ways around this, like adding categories (so if you see a category you know someone at least attempted to evaluate some packages in that category), but they can also have their own problems (like increasing the level of work required for curation). I'm not sure what the best solution is, but just wanted to mention this issue.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WYLHRI... Code of Conduct: http://python.org/psf/codeofconduct/
On 04Jul2023 17:21, Christopher Barker <pythonchb@gmail.com> wrote:
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
I have always thought that any community scoring system should allow other users to mark up/down other reviewers w.r.t the scores presented. That markup should only affect the scoring as presented to the person doing the markup, like a personal killfile. The idea is that you can have the ratings you see affected by notions that "I trust the opinions of user A" or "I find user B's opinion criteria not useful for my criteria". Of course the "ignore user B" has some of the same downsides as trying individually ignore certain spam sources: good for a single "bad" actor (by my personal criteria) to ignore their (apparent) gaming of the ratings but not good for a swarm of robots. Cheers, Cameron Simpson <cs@cskk.id.au>
On Sun, 9 Jul 2023 at 02:11, Cameron Simpson <cs@cskk.id.au> wrote:
On 04Jul2023 17:21, Christopher Barker <pythonchb@gmail.com> wrote:
3) A rating system built into PyPi -- This could be a combination of two things: A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc. B - Community ratings -- upvotes. stars, whatever.
If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.
I have always thought that any community scoring system should allow other users to mark up/down other reviewers w.r.t the scores presented. That markup should only affect the scoring as presented to the person doing the markup, like a personal killfile. The idea is that you can have the ratings you see affected by notions that "I trust the opinions of user A" or "I find user B's opinion criteria not useful for my criteria".
Of course the "ignore user B" has some of the same downsides as trying individually ignore certain spam sources: good for a single "bad" actor (by my personal criteria) to ignore their (apparent) gaming of the ratings but not good for a swarm of robots.
Hi Cameron, That sounds to me like the basis of a distributed trust network, and could be useful. Some thoughts from experience working with Python (and other ecosystem) packages: after getting to know the usernames of developers and publishers of packages, I think that much of that trust can be learned by individuals without the assistance of technology -- that is to say, people begin to recognize authors that they trust, and authors that they don't. How to provide reassurance that each author's identity remains the same between modifications to packages/code is a related challenge, though. FWIW, I don't really like many of the common multi-factor authentication systems used today, because I don't like seeing barriers to expression emerge, even when the intent is benevolent. I'm not sure I yet have better alternatives to suggest, though. Your message also helped me clarify why I don't like embedding any review information at all within packaging ecosystems -- regardless of whether transitive trust is additionally available in the form of reviews. The reason is that I would prefer to see end-to-end transparent supply chain integrity for almost all, if not all, software products. I'm typing this in a GMail web interface, but I do not believe that many people have access to all of the source code for the version that I'm using. If everyone did, and if that source included strong dependency hashes to indicate the dependencies used -- similar to the way that pip-tools[1] can write a persistent record of a dependency set, allowing the same dependencies to be inspected and installed by others -- then people could begin to build their own mental models of what packages -- and what specific versions of those packages -- are worth trusting. In other words: if all of the software and bill-of-materials for it became open and published, and could be constructed reproducibly[2], then social trust would emerge without a requirement for reviews. That would not be mutually-exclusive with the presence of reviews -- verbal, written, or otherwise -- elsewhere. Thanks, James [1] - https://github.com/jazzband/pip-tools/ [2] - https://www.reproducible-builds.org/
On Sun, 9 Jul 2023 at 18:06, James Addison via Python-ideas <python-ideas@python.org> wrote:
On Sun, 9 Jul 2023 at 02:11, Cameron Simpson <cs@cskk.id.au> wrote:
I have always thought that any community scoring system should allow other users to mark up/down other reviewers w.r.t the scores presented. That markup should only affect the scoring as presented to the person doing the markup, like a personal killfile. The idea is that you can have the ratings you see affected by notions that "I trust the opinions of user A" or "I find user B's opinion criteria not useful for my criteria".
That sounds to me like the basis of a distributed trust network, and could be useful.
Why distributed? This sounded more like a centralized system, but one where you can "ignore reviews from this user" for any other user. ChrisA
On Sun, Jul 9, 2023, 09:13 Chris Angelico <rosuav@gmail.com> wrote:
On Sun, 9 Jul 2023 at 18:06, James Addison via Python-ideas <python-ideas@python.org> wrote:
On Sun, 9 Jul 2023 at 02:11, Cameron Simpson <cs@cskk.id.au> wrote:
I have always thought that any community scoring system should allow other users to mark up/down other reviewers w.r.t the scores presented. That markup should only affect the scoring as presented to the person doing the markup, like a personal killfile. The idea is that you can have the ratings you see affected by notions that "I trust the opinions of user A" or "I find user B's opinion criteria not useful for my criteria".
That sounds to me like the basis of a distributed trust network, and could be useful.
Why distributed? This sounded more like a centralized system, but one where you can "ignore reviews from this user" for any other user.
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed. And I'd argue that it's more difficult to guarantee that the trust presented to all participants is fair and accurate in either a centralized or a proprietary system.
James Addison via Python-ideas writes:
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed.
ISTM the primary use cases advanced here have been for "naive" users. Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more. So in practice they'll often want to go with some kind of publicly weighted average of scores. To avoid the problem of ballot-box stuffing, you could go the way that pro sports often do for their All-Star teams: have one vote by anybody who cares to register an ID, and another by verified committers, including committers from "trusted" projects as well. Steve
On Sun, Jul 9, 2023, 15:52 Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
James Addison via Python-ideas writes:
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed.
ISTM the primary use cases advanced here have been for "naive" users. Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more. So in practice they'll often want to go with some kind of publicly weighted average of scores.
To avoid the problem of ballot-box stuffing, you could go the way that pro sports often do for their All-Star teams: have one vote by anybody who cares to register an ID, and another by verified committers, including committers from "trusted" projects as well.
As someone who sometimes prefers to learn independently -- even if that takes longer and may produce unusual perspectives -- I remember learning web development by reading the source HTML of websites. Maybe that wouldn't be the typical way to learn programming -- but given the volume of successful and important software that exists in the world today, I think that having that code and the packages that it is composed of available to learn from would be highly beneficial to maintainers, educators and students, and other groups as well.
I didn't really address your point there; indirectly mine was to reaffirm a sense that not all participants may want to read the opinions of others while learning technologies, and that's why I am skeptical of the suggestions to include subjective user ratings of any kind within Python packaging infrastructure. On Sun, Jul 9, 2023, 16:09 James Addison <jay@jp-hosting.net> wrote:
On Sun, Jul 9, 2023, 15:52 Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
James Addison via Python-ideas writes:
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed.
ISTM the primary use cases advanced here have been for "naive" users. Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more. So in practice they'll often want to go with some kind of publicly weighted average of scores.
To avoid the problem of ballot-box stuffing, you could go the way that pro sports often do for their All-Star teams: have one vote by anybody who cares to register an ID, and another by verified committers, including committers from "trusted" projects as well.
As someone who sometimes prefers to learn independently -- even if that takes longer and may produce unusual perspectives -- I remember learning web development by reading the source HTML of websites.
Maybe that wouldn't be the typical way to learn programming -- but given the volume of successful and important software that exists in the world today, I think that having that code and the packages that it is composed of available to learn from would be highly beneficial to maintainers, educators and students, and other groups as well.
On Sun, 9 Jul 2023 at 15:56, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
James Addison via Python-ideas writes:
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed.
ISTM the primary use cases advanced here have been for "naive" users. Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more. So in practice they'll often want to go with some kind of publicly weighted average of scores.
I'll also point out that I'm a long-standing Python developer, and a core dev, and I still *regularly* get surprised by finding out that community members that I know and respect are maintainers of projects that I had no idea they were associated with. Which suggests that I have no idea how many *other* people who I think of as "just another person" might be maintainers of key, high-profile projects. So I think that a model based round weighting results based on "who you trust" would have some rather unfortunate failure modes. Honestly, I'd be more likely to go with "I can assume that projects that are dependencies of other projects that I already know are good quality, are themselves good quality". Which excludes people from the equation altogether, but which falls apart when I'm looking for a library in a new area. Paul
On Sun, Jul 9, 2023, 16:25 Paul Moore <p.f.moore@gmail.com> wrote:
On Sun, 9 Jul 2023 at 15:56, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
James Addison via Python-ideas writes:
The implementation of such a system could either be centralized or distributed; the trust signals that human users infer from it should always be distributed.
ISTM the primary use cases advanced here have been for "naive" users. Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more. So in practice they'll often want to go with some kind of publicly weighted average of scores.
I'll also point out that I'm a long-standing Python developer, and a core dev, and I still *regularly* get surprised by finding out that community members that I know and respect are maintainers of projects that I had no idea they were associated with. Which suggests that I have no idea how many *other* people who I think of as "just another person" might be maintainers of key, high-profile projects. So I think that a model based round weighting results based on "who you trust" would have some rather unfortunate failure modes.
Honestly, I'd be more likely to go with "I can assume that projects that are dependencies of other projects that I already know are good quality, are themselves good quality". Which excludes people from the equation altogether, but which falls apart when I'm looking for a library in a new area.
Paul
Cautious +1, since PageRank did pretty well for a good stint in a somewhat analogous environment.
On Sun, Jul 9, 2023 at 8:37 AM James Addison via Python-ideas < python-ideas@python.org> wrote:
ISTM the primary use cases advanced here have been for "naive" users.
Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more.
There are 718,155 users on PyPi -- I can't imagine that trying to figure out which of those hundreds of thousands of users you trust for reviews would be at all helpful -- it simply doesn't scale.
I suppose if my fantasy "curated" site existed, and the curation group were of a manageable size, then you could do that, but the point of having a modest number of curators is that you can already trust them ;-) Honestly, I'd be more likely to go with "I can assume that projects that
are dependencies of other projects that I already know are good quality, are themselves good quality". Which excludes people from the equation altogether,
I there are a number of metrics that could be used -- and "how many projects" use this projecct as a dependency" is a good one. -- "which" projects would be even stronger. And there are others. Anything like that it can be gamed, but I"m not sure that's as huge a problem as it might be -- what is the incentive to game this system? this is all open source, no one's making money, and frankly, having a lot of users can be a burden as well! Sure, many of us would really like a lot of people to use our code, but the incentives to cheat to get more users really aren't that strong. -- at least if. you can filter out the malware in some other way. -CHB
but which falls apart when I'm looking for a library in a new area.
Paul
Cautious +1, since PageRank did pretty well for a good stint in a somewhat analogous environment.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/J5RH7Z... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sun, Jul 9, 2023, 23:35 Christopher Barker <pythonchb@gmail.com> wrote:
On Sun, Jul 9, 2023 at 8:37 AM James Addison via Python-ideas < python-ideas@python.org> wrote:
ISTM the primary use cases advanced here have been for "naive" users.
Likely they won't be in a position to decide whether they trust Guido van Rossum or Egg Rando more.
There are 718,155 users on PyPi -- I can't imagine that trying to figure out which of those hundreds of thousands of users you trust for reviews would be at all helpful -- it simply doesn't scale.
The page https://levien.com/free/tmetric-HOWTO.html has some thoughts on how to build a scalable, resilient trust network based on user ratings; I can't guarantee that it'll change your opinion, though! I suppose if my fantasy "curated" site existed, and the curation group
were of a manageable size, then you could do that, but the point of having a modest number of curators is that you can already trust them ;-)
Honestly, I'd be more likely to go with "I can assume that projects that
are dependencies of other projects that I already know are good quality, are themselves good quality". Which excludes people from the equation altogether,
I there are a number of metrics that could be used -- and "how many projects" use this projecct as a dependency" is a good one. -- "which" projects would be even stronger. And there are others.
Anything like that it can be gamed, but I"m not sure that's as huge a problem as it might be -- what is the incentive to game this system? this is all open source, no one's making money, and frankly, having a lot of users can be a burden as well!
Sure, many of us would really like a lot of people to use our code, but the incentives to cheat to get more users really aren't that strong. -- at least if. you can filter out the malware in some other way.
-CHB
but which falls apart when I'm looking for a library in a new area.
Paul
Cautious +1, since PageRank did pretty well for a good stint in a somewhat analogous environment.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/J5RH7Z... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, 24 Jul 2023 at 21:02, James Addison via Python-ideas <python-ideas@python.org> wrote:
... some thoughts on how to build a scalable, resilient trust network based on user ratings; I can't guarantee that it'll change your opinion, though!
This still has the fundamental problems of any sort of user rating system: popular packages are inherently favoured. And we can already get a list of popular packages, because download stats are available. So how would this scheme help? ChrisA
If popular packages weren't favored that would be a problem. Popularity should be correlated with "trustworthiness" or whatever the metric this curated repo seeks to maximize. I think the important thing is that the packages are both popular and have passed some sort of vetting procedure. For instance, for a very long time Python2 was far more popular than Python3, but any expert in the field would encourage users to move to Python3 sooner rather than later. Python2 is popular, but it wouldn't have made the cut on some expert-curated list. So it helps in that it reranks popular packages (and also excludes some) for those who want to adopt a more strict security / reliability posture. By no means do I think this would replace pypi as the de-facto packaging repository. Its low barrier to entry is extremely important for a thriving community, but I also wouldn't mind having something a bit more robust. I also think this project would have to careful not to become yet another "awsome-python-package" collection. Those certainly have value, but based on the initial proposal, I'm interested in something a tad more robust. On Mon, Jul 24, 2023 at 8:55 AM Chris Angelico <rosuav@gmail.com> wrote:
... some thoughts on how to build a scalable, resilient trust network
On Mon, 24 Jul 2023 at 21:02, James Addison via Python-ideas <python-ideas@python.org> wrote: based on user ratings; I can't guarantee that it'll change your opinion, though!
This still has the fundamental problems of any sort of user rating system: popular packages are inherently favoured. And we can already get a list of popular packages, because download stats are available. So how would this scheme help?
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LU6BFQ... Code of Conduct: http://python.org/psf/codeofconduct/
-- -Dr. Jon Crall (him)
Jonathan Crall <erotemic@gmail.com> ezt írta (időpont: 2023. júl. 24., H, 15:29):
If popular packages weren't favored that would be a problem. Popularity should be correlated with "trustworthiness" or whatever the metric this curated repo seeks to maximize. I think the important thing is that the packages are both popular and have passed some sort of vetting procedure.
For instance, for a very long time Python2 was far more popular than Python3, but any expert in the field would encourage users to move to Python3 sooner rather than later. Python2 is popular, but it wouldn't have made the cut on some expert-curated list.
So it helps in that it reranks popular packages (and also excludes some) for those who want to adopt a more strict security / reliability posture.
By no means do I think this would replace pypi as the de-facto packaging repository. Its low barrier to entry is extremely important for a thriving community, but I also wouldn't mind having something a bit more robust.
I also think this project would have to careful not to become yet another "awsome-python-package" collection. Those certainly have value, but based on the initial proposal, I'm interested in something a tad more robust.
... some old stuff cut ... Hi Folks, it has got to my mind that even just grouping similar / same goal packages could help the current situation. Unfortunately searching by name or category is not enough, and takes much time. By linking similar packages together would give the users the possibility to evaluate all / several of them. Additionally perhaps the users could give relative valuation, for example there are A, B, C, D similar packages, users could say: I tried out A and B, and found that A is better then B, and could have some valuation categories: simple, easy, powerful etc. This would show for example that package A is simple, but B is more powerful BR, George
George Fischhof writes: [For heaven's sake, trim! You expressed your ideas very clearly, the quote adds little to them.]
it has got to my mind that even just grouping similar / same goal packages could help the current situation.
This is a good idea. I doubt it reduces the problem compared to the review site or the curation very much: some poor rodent(s) still gotta put the dinger on the feline. However, in designing those pages, we could explicitly ask for names of similar packages and recommendations for use cases where an alternative package might be preferred, and provide links to the review pages for those packages that are mentioned in the response. We can also provide suggestions based on comparisons other users have made. (Hopefully there won't be too many comparisons like "this package is the numpy of its category" -- that's hard to parse!)
Additionally perhaps the users could give relative valuation,
Not sure asking for rankings is a great idea, globally valid rankings are rare -- ask any heavy numpy user who occasionally uses the sum builtin on lists.
for example there are A, B, C, D similar packages, users could say: I tried out A and B, and found that A is better then B, and could have some valuation categories: simple, easy, powerful etc. This would show for example that package A is simple, but B is more powerful
These tags would be useful. I think the explanation needs to be considered carefully, because absolutes don't really exist, and if you're comparing to the class, you want to know which packages the reviewer is comparing to. I'm not sure many users would go to the trouble of providing full rankings, even for the packages they've mentioned. Worth a try though! Steve
On Mon, Jul 24, 2023 at 10:04 AM Chris Angelico <rosuav@gmail.com> wrote: can you tell me what this vetting procedure proves that isn't already proven by mere popularity itself?
I think that's what this thread is trying to discuss. Do I have the exact perfect implementation? No. But I imagine it to be akin to peer-review. I can't prove this, but I think it adds signal that complements popularity.
And are you also saying that packages should be *removed* from this curated list?
Yes, absolutely. If packages fall out of maintenance, are deprecated, or end-of-life, they should no longer be this curated list. I imagine the mechanism would be a series of snapshots of what the state of the list is at a particular point in time. What is the mechanism for determining the trigger to remove a package? No clue right now. There are a lot of problems - most of them social - that need to be sorted out to make a *useful* curated package list a reality. I don't claim to have the answers, but I'm willing to participate in discussion to find them.
More robust in what way?
It's curated by people I trust more than the average bear.
If not, how is it different from "yet another collection"?
We are discussing details about it here instead of just posting what we think should be there on github right now. I'm putting a bit of trust in this group to find a way to do that. I do think that separating the opinions of experts (however we choose to define that group - but I hope you agree that it should be possible to find *some *reasonable definition) as an auxiliary re-ranking on top of popularity is a good differentiator. On Tue, Jul 25, 2023 at 1:31 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
George Fischhof writes:
[For heaven's sake, trim! You expressed your ideas very clearly, the quote adds little to them.]
it has got to my mind that even just grouping similar / same goal packages could help the current situation.
This is a good idea. I doubt it reduces the problem compared to the review site or the curation very much: some poor rodent(s) still gotta put the dinger on the feline.
However, in designing those pages, we could explicitly ask for names of similar packages and recommendations for use cases where an alternative package might be preferred, and provide links to the review pages for those packages that are mentioned in the response. We can also provide suggestions based on comparisons other users have made. (Hopefully there won't be too many comparisons like "this package is the numpy of its category" -- that's hard to parse!)
Additionally perhaps the users could give relative valuation,
Not sure asking for rankings is a great idea, globally valid rankings are rare -- ask any heavy numpy user who occasionally uses the sum builtin on lists.
for example there are A, B, C, D similar packages, users could say: I tried out A and B, and found that A is better then B, and could have some valuation categories: simple, easy, powerful etc. This would show for example that package A is simple, but B is more powerful
These tags would be useful. I think the explanation needs to be considered carefully, because absolutes don't really exist, and if you're comparing to the class, you want to know which packages the reviewer is comparing to. I'm not sure many users would go to the trouble of providing full rankings, even for the packages they've mentioned. Worth a try though!
Steve
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MGYK3E... Code of Conduct: http://python.org/psf/codeofconduct/
-- -Dr. Jon Crall (him)
On Wed, 26 Jul 2023 at 01:20, Jonathan Crall <erotemic@gmail.com> wrote:
On Mon, Jul 24, 2023 at 10:04 AM Chris Angelico <rosuav@gmail.com> wrote: can you tell me what this vetting procedure proves that isn't already proven by mere popularity itself?
I think that's what this thread is trying to discuss. Do I have the exact perfect implementation? No. But I imagine it to be akin to peer-review. I can't prove this, but I think it adds signal that complements popularity.
So you're suggesting nothing, only that there might be a possible suggestion to be made. Do you have an actual concrete proposal?
And are you also saying that packages should be *removed* from this curated list?
Yes, absolutely. If packages fall out of maintenance, are deprecated, or end-of-life, they should no longer be this curated list. I imagine the mechanism would be a series of snapshots of what the state of the list is at a particular point in time. What is the mechanism for determining the trigger to remove a package? No clue right now. There are a lot of problems - most of them social - that need to be sorted out to make a useful curated package list a reality. I don't claim to have the answers, but I'm willing to participate in discussion to find them.
So.... who's going to actually do the work? You?
More robust in what way?
It's curated by people I trust more than the average bear.
I trust the average bear a lot more than most people. They are pretty awesome. Did you see the one that tried to eat Tom Scott's gopro camera? I'd trust that bear (and stay out of its way).
If not, how is it different from "yet another collection"?
We are discussing details about it here instead of just posting what we think should be there on github right now. I'm putting a bit of trust in this group to find a way to do that. I do think that separating the opinions of experts (however we choose to define that group - but I hope you agree that it should be possible to find some reasonable definition) as an auxiliary re-ranking on top of popularity is a good differentiator.
So.... yeah, you have no concrete proposal. Okay then. ChrisA
On Mon, 24 Jul 2023 at 23:28, Jonathan Crall <erotemic@gmail.com> wrote:
If popular packages weren't favored that would be a problem. Popularity should be correlated with "trustworthiness" or whatever the metric this curated repo seeks to maximize. I think the important thing is that the packages are both popular and have passed some sort of vetting procedure.
Okay, but can you tell me what this vetting procedure proves that isn't already proven by mere popularity itself?
For instance, for a very long time Python2 was far more popular than Python3, but any expert in the field would encourage users to move to Python3 sooner rather than later. Python2 is popular, but it wouldn't have made the cut on some expert-curated list.
Experts were divided for a very long time. I'm not sure what your point is here. And are you also saying that packages should be *removed* from this curated list? Because if so, what's the mechanic for this? (Python 2 absolutely WOULD have made the cut on any expert-curated list prior to Python 3's inception.)
So it helps in that it reranks popular packages (and also excludes some) for those who want to adopt a more strict security / reliability posture.
By no means do I think this would replace pypi as the de-facto packaging repository. Its low barrier to entry is extremely important for a thriving community, but I also wouldn't mind having something a bit more robust.
I also think this project would have to careful not to become yet another "awsome-python-package" collection. Those certainly have value, but based on the initial proposal, I'm interested in something a tad more robust.
More robust in what way? What exactly are the requirements to be part of this list? Will all experts agree? If not, how is it different from "yet another collection"? (Also, PLEASE don't top-post. There's no value in it. Show what you're responding to.) ChrisA
participants (12)
-
Brendan Barnwell
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
Dom Grigonis
-
George Fischhof
-
Gregory Disney
-
James Addison
-
Jonathan Crall
-
Paul Moore
-
Stephen J. Turnbull