[Python-ideas] Re: "Curated" package repo?

6 Jul 2023

      It is possible, that issues being discussed at this stage are not as relevant as they seem at stage 0, which this idea is at.
(Unless someone here is looking for a very serious commitment.)

If some sort of starting point which is “light” in approach was decided on, then the process can be readjusted as/if it progresses. Maybe no need to put a “stamp” on a package, but simply provide comparison statistics given some initial structure.

I think a lot of packages can be filtered on objective criteria, without even reaching the stage of subjective opinions.

———————————————

General info - fairly easy to inspect without the need of subjective opinions.
1. License
2. Maintenance - hard stack overflow & repo stats

Performance - hard stats:
1. There will be lower level language extensions, which even if not up to standards in other aspects are worth attention, someone else might pick it up and rejuvenate if explicitly indicated.
2. There will be a pure python packages:
  a) good coding standards with good knowledge on efficient programming in pure python
  b) pure python packages that take ages to execute

In many areas, this will filter out many libraries. Although, there are some, where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks can be quite tight.

The remaining evaluation can be subjective opinions, where preferences of curators can be taken into account:
1. Coding standards
2. Integration
3. Flexibility/functionality
4. …

IMO, all of this can be done while being on the safe side - if unsure, leave the plain statistics for users and developers to see.

———————————————

An example. (I am not the developer of any of these)
Json serialisers:
1. json - stdlib, average performance, well maintained, flexible, very safe to depend on
2. simplejson - 3rd party, pure python, performance in line with 1), drop-in replacement for json, been around for a while, safe to depend on
2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), drop-in replacement for json, been around for a while, safe to depend on
3. ijson - 3rd party, C&python, average performance, proprietary interface relying heavily on iterator protocol, status <TBC>
4. orjson - 3rd party, highly optimised C, performance on par with fastest serialisers on the market, not-a-drop-in-replacement for json, due to sacrifices for performance, rich in functionality, well maintained, safe to depend on
5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in replacement for json, extends json to json5 features such as comments, well maintained, safe to depend on

(THIS IS JUST AN EXAMPLE OF COMPARISON, NOT TO BE RELIED ON)

So there is still a bit of opinion here, but all of this can be standardised and put in numbers, and comparison of this type can be  done with little-to-none personal opinion.

———————————————

After structure for this is in place, it would be easier to discuss further whether more serious curation is needed/worthwhile/makes sense.

Allow queries from users, package developers, places to gather opinions, maybe volunteering to do a deeper analysis… 

And once there is enough input, maybe a curated guidance can be added to the review. But this is the next stage, which is not necessarily needed to be thoroughly thought out before putting in place something simple, objective & risk-free.

———————————————

Maybe stage 1. is all that users need - a reliable place to check hard stats, where users and developers can update them for the benefit of all. With enough popularity, package developers should be motivated to issue stat updates (e.g. add additional column to benchmarking script), and users would issue similar updates (e.g. add additional column to benchmarking script, where the library is extremely slow).

It is possible that the project would naturally turn to direction of hard stat coverage instead of “deep” curation. E.g.
json serialisers become a sub-branch of schema-less serialisers,
which in turn become a branch of serialisers

Then the user can then view comparable stats of the whole branch, sub-branch, sub-sub-branch to get the information he needs to make decisions. And apply different filters in the process to get to the final list of packages on which the user will have to do hiss final subjective analysis anyways.

———————————————

E.g. User needs a serialiser. He prefers schema-less, but willing to go schema-based given large increases in performance. Does not mind low maintenance status given he aims to maintain his own proprietary serialisation library in the long run. Naturally, clean & simple coding with permissive license is preferred.

Just a portal with up-to-date stats where user could interactively navigate such decisions would be a good start and potentially a “safe” route to begin with.

The starting work on such thing then would be more heavy on automation, rather than politics, which in turn will be easier to tackle later once there is something more tangible to discuss.
...
On 5 Jul 2023, at 21:34, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2023-07-05 00:00, Christopher Barker wrote:
...
I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
I noticed this in the discussion and I think it's an important difference in how people approach this question.  Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose".
I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions.  It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk.
Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators.
Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons.  First, it's easier.  Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there.
-- 
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no path, and leave a trail."
  --author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XH2GTR...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: "Curated" package repo?

Dom Grigonis