Thanks David,

Not keen to take it on solo.

Ideally, IMO, this could be a joint project of this whole group. Someone more senior from this group creates a repo and oversights the progress, couple of more experienced members make initial decisions to get things going, someone issues a review PR, couple of new OPs are suggested to write a review on their queries.

Would be good to know if:
1. Someone has their own little benchmarking/reviews and would be willing to spend a little time to issue PR for some initial content.
2. People from this group see themselves of going to such place and adding a new great package they have just found to existing reviews (or even more importantly an awful package)
3. Someone sees the opportunity to contribute given someone took such project on. e.g.
  a) someone is very excited about benchmarking automation
  b) someone has some working scripts to fetch github stats / stack trends that are waiting to be used
  c) someone wants to take their devops to next level and sees this as a good opportunity
  d) someone is very keen on high level view and would like to contribute in working on categorisation (partially relying on python stdlib/libref could be intuitive, although then it is a dependency)

A lot of “someones” in this e-mail...

On 6 Jul 2023, at 16:55, David Mertz, Ph.D. <> wrote:


I'd recommend you simply start a GitHub project for "Curated PyPI", find a catchy domain name, and publish that via GH Pages.  That's a few hours of work to get a skeleton.  But no, I'm not quite volunteering to create and maintain it myself today.

After there is a concrete site existing, you can refine the presentation and governance procedure iteratively.  As a start, it can basically just be a web page with evaluations like yours of the JSON libraries.  At a first pass, there's no need for anything dynamic on the page, just some tables (or maybe accordions, or side-bar navigation, or whatever).

I'd be very likely to make some PRs to such a repository myself.  At some point, with enough recommendations, you might add some automation. E.g. some script that checks all the submitted "package reviews" and creates an aggregation ("10 reviews with average rating of 8").  Even there, running that thing offline every once in a while is plenty to start (you could do GH Actions or something too, if you like).

There are a few decisions to make, but none that difficult.  For example, what format are reviews? Markdown? YAML? TOML? JSON? Python with conventions? Whatever it is, it should have a gentle learning curve and be human readable IMO.

On Thu, Jul 6, 2023 at 9:26 AM Dom Grigonis <> wrote:
It is possible, that issues being discussed at this stage are not as relevant as they seem at stage 0, which this idea is at.
(Unless someone here is looking for a very serious commitment.)

If some sort of starting point which is “light” in approach was decided on, then the process can be readjusted as/if it progresses. Maybe no need to put a “stamp” on a package, but simply provide comparison statistics given some initial structure.

I think a lot of packages can be filtered on objective criteria, without even reaching the stage of subjective opinions.


General info - fairly easy to inspect without the need of subjective opinions.
1. License
2. Maintenance - hard stack overflow & repo stats

Performance - hard stats:
1. There will be lower level language extensions, which even if not up to standards in other aspects are worth attention, someone else might pick it up and rejuvenate if explicitly indicated.
2. There will be a pure python packages:
  a) good coding standards with good knowledge on efficient programming in pure python
  b) pure python packages that take ages to execute

In many areas, this will filter out many libraries. Although, there are some, where it wouldn’t. E.g. schema-based low level serialisation, where benchmarks can be quite tight.

The remaining evaluation can be subjective opinions, where preferences of curators can be taken into account:
1. Coding standards
2. Integration
3. Flexibility/functionality
4. …

IMO, all of this can be done while being on the safe side - if unsure, leave the plain statistics for users and developers to see.


An example. (I am not the developer of any of these)
Json serialisers:
1. json - stdlib, average performance, well maintained, flexible, very safe to depend on
2. simplejson - 3rd party, pure python, performance in line with 1), drop-in replacement for json, been around for a while, safe to depend on
2. ultrajson - 3rd party, written in C, >3x performance boost on 1) & 2), drop-in replacement for json, been around for a while, safe to depend on
3. ijson - 3rd party, C&python, average performance, proprietary interface relying heavily on iterator protocol, status <TBC>
4. orjson - 3rd party, highly optimised C, performance on par with fastest serialisers on the market, not-a-drop-in-replacement for json, due to sacrifices for performance, rich in functionality, well maintained, safe to depend on
5. pyjson5 - 3rd party, c++ performance similar to ultrajson, can be a drop-in replacement for json, extends json to json5 features such as comments, well maintained, safe to depend on


So there is still a bit of opinion here, but all of this can be standardised and put in numbers, and comparison of this type can be  done with little-to-none personal opinion.


After structure for this is in place, it would be easier to discuss further whether more serious curation is needed/worthwhile/makes sense.

Allow queries from users, package developers, places to gather opinions, maybe volunteering to do a deeper analysis…

And once there is enough input, maybe a curated guidance can be added to the review. But this is the next stage, which is not necessarily needed to be thoroughly thought out before putting in place something simple, objective & risk-free.


Maybe stage 1. is all that users need - a reliable place to check hard stats, where users and developers can update them for the benefit of all. With enough popularity, package developers should be motivated to issue stat updates (e.g. add additional column to benchmarking script), and users would issue similar updates (e.g. add additional column to benchmarking script, where the library is extremely slow).

It is possible that the project would naturally turn to direction of hard stat coverage instead of “deep” curation. E.g.
json serialisers become a sub-branch of schema-less serialisers,
which in turn become a branch of serialisers

Then the user can then view comparable stats of the whole branch, sub-branch, sub-sub-branch to get the information he needs to make decisions. And apply different filters in the process to get to the final list of packages on which the user will have to do hiss final subjective analysis anyways.


E.g. User needs a serialiser. He prefers schema-less, but willing to go schema-based given large increases in performance. Does not mind low maintenance status given he aims to maintain his own proprietary serialisation library in the long run. Naturally, clean & simple coding with permissive license is preferred.

Just a portal with up-to-date stats where user could interactively navigate such decisions would be a good start and potentially a “safe” route to begin with.

The starting work on such thing then would be more heavy on automation, rather than politics, which in turn will be easier to tackle later once there is something more tangible to discuss.

> On 5 Jul 2023, at 21:34, Brendan Barnwell <> wrote:
> On 2023-07-05 00:00, Christopher Barker wrote:
>> I'm noting this, because I think it's part of the problem to be solved, but maybe not the mainone (to me anyway). I've been focused more on "these packages are worthwhile, by some definition of worthwhile). While I think Chris A is more focused on "which of these seemingly similar packages should I use?" -- not unrelated, but not the same question either.
>       I noticed this in the discussion and I think it's an important difference in how people approach this question.  Basically what some people want from a curated index is "this package is not junk" while others want "this package is actually good" or even "you should use this package for this purpose".
>       I think that providing "not-junk level" curation is somewhat more tractable, because this form of curation is closer to a logical OR on different people's opinions.  It may be that many people tried a package and didn't find it useful, but if at least one person did find it useful, then we can probably say it's not junk.
>       Providing "actually-good level" curation or "recommendations" is harder, because it means you actually have to address differences of opinion among curators.
>       Personally I tend to think a not-junk type curation is the better one to aim at, for a few reasons.  First, it's easier.  Second, it eliminates one of the main problems with trying to search for packages on pypi, namely the huge number of "mytestpackage1"-type packages. Third, this is what conda-forge does and it seems to be working pretty well there.
> --
> Brendan Barnwell
> "Do not follow where the path may lead.  Go, instead, where there is no path, and leave a trail."
>   --author unknown
> _______________________________________________
> Python-ideas mailing list --
> To unsubscribe send an email to
> Message archived at
> Code of Conduct:

Python-ideas mailing list --
To unsubscribe send an email to
Message archived at
Code of Conduct:

The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.