On Tue, Jul 4, 2023 at 5:49 PM Chris Angelico <rosuav@gmail.com> wrote:
On Wed, 5 Jul 2023 at 10:26, Christopher Barker <pythonchb@gmail.com> wrote:
> The :problem", as I see it.
>  - The Python standard library is not, and will never be fully comprehensive -- most projects require *some* third party packages.
>  - There are a LOT of packages available on PyPi -- with a very wide range of usefulness, quality and maintenance -- everything from widely used packages with a huge community (e.g. numpy) to packages that are release 0.0.1, and never seen an update, and may not even work.

Remember though that this problem is also Python's strength here. The
standard library does not NEED to be comprehensive, and publishing on
PyPI is deliberately easy. The barrier to entry for the stdlib is
high; the barrier to entry for PyPI is low.

Absolutely -- and I"m not making any comment about the stdlib -- it's not the point here.

But yes, the low (zero) barrier to entry to PyPi was probably the right way to go when it got started -- now, I'm not so sure. *some* barrier to entry would be helpful.


Thanks, I had forgotten about the Wiki! It foundered for a while ( a lot of years ago -- I"ve been around...), but at a glance, it's looking good now.

Imagine a page like
That way, it's decentralized for editing, but has a central "hub" that
people can easily find.

yup -- great idea.
I suspect this would end up being broadly equivalent to the first
option, but with more effort by a core group of people (or a single
maintainer), and in return, would have a more consistent look and
> 3) A rating system built into PyPi -- This could be a combination of two things:
>   A - Automated analysis -- download stats, dependency stats, release frequency, etc, etc, etc.
>   B - Community ratings -- upvotes. stars, whatever.
> If done well, that could be very useful -- search on PyPi listed by rating. However -- :done well" ios a huge challenge -- I don't think there's a way to do the automated system right, and community scoring can be abused pretty easily. But maybe folks smarter than me could make it work with one or both of these approaches.

Neither of them adequately answers questions like
"which is right *for this use-case*",

I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.

Which makes me realize that having a centralized package review site is complementary to a curated package index -- they are not replacements for one another.

> 4)  A self contained repository of packages that you could point pip to --

Definitely possible; how would this compare to Conda?

Technically, conda is similar to pip -- it has a default "channel" (a channel is an indexed repository of packages) it points to, and you can point it to a different one, or any number of others, or install a single package from a particular channel.

Socially, it's pretty different 
- There is no channel like PyPi that anyone can put anything on willy nilly. 
- The default channel is operated by Anaconda.com -- and no one else can put any thing on there. (they take suggestions, but it's a pretty big lift to get them to add a package)
- The protocol for a channel is pretty simple -- all you really need is an http server, but in practice, most folks host their channels on the Anaconda.org server -- it's a free service that anyone can create a channel on -- there are a LOT -- folks use them for their personal projects, etc.

- Then there is conda-forge:
It grew out of an effort to collaborate among a number of folks operating channels focused on particular fields -- met/ocean science, astronomy, computational biology, ... we all had different needs, but they overlapped -- why not share resources? Thanks to the heroic efforts of a few folks, it grew to what it is now: a gitHub and CI -based conda package build system that published a conda channel on anaconda.org with over 22,000 (wow! I think I'm reading that right) packages. 


They are curated -- anyone can propose a new package (via PR) -- but it only gets added once it's been reviewed and approved by the core team. Curation wasn't the goal, but it's necessary in order to have any hope that they will all work together. The review process is really of the package, not the code in the package (is it built correctly? is it compatible with the rest of conda-forge? Does it include the license file? Is there a maintainer? ...) But the end result is a fair bit of curation -- users can be assured that:
1 - The package works
2 - The package is useful enough that someone took the time to get it up there.
3 - It's very unlikely to be malware (I don't think the conda-forge policy really looks hard for that, but typosquatting and that sort of thing are pretty much impossible.

What about OS package managers like the Debian repositories?
I have no idea, other than that the majors, at least, put a LOT of work into having a pretty comprehensive base repository of "vetted" packages 

> (1) and (2) have the advantage that they could be useful even without being comprehensive -- tthey's need to have some critical mass to get any traction, but maybe not THAT much.

Right, and notably, they can be useful without covering every topic.
You could write a blog post about database ORM packages, and that's
useful right there, without worrying about whether there's any review
of internet protocol packages.

Not many forces needed, but if you want to go with the Python Wiki, it
might require a wiki admin to create the initial page, in which case
I'd be happy to do that.

conda-forge has about 22,121 -- that's enough to be very useful, but a lot of use-cases are not well covered, and I know I still have to contribute one once in a while.

Looking now -- PyPi has 465,295 projects more than 20 times as many -- I wonder how many of those are "useful"?

As talked about, you can only get so much with stats -- but if someone is familiar with working with the PyPi stats, I'd love some help exploring that question.


Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython