[Python-ideas] Re: adding support for a "raw output" in JSON serializer

26 Aug 2019

      On Mon, 26 Aug 2019 at 09:47, Richard Musil <risa2000x@gmail.com> wrote:
...
I gave it some thought over the weekend and came to the conclusion that I am not going to go further with it (where "it" means my or anyone else's idea). The reason is that I totally lost any motivation. I however feel some more elaborate answer might be due and I will try to give one.
The other day (actually before I posted my last reply), I went to core-mentorship list to get some ideas about how to continue. There was this thread about how people got their first contribution and while most were positive there was one post which kind of stood out because it described an unsuccessful attempt which finally led to parting ways. I realized there is no shame in that.
I came here with some rough idea about the JSON module features, but had no clue what are the "real" use cases, what are peoples' expectations, etc. This thread actually helped me to get more of the understanding and the insight. I thought I had a nice feature in mind, and was wondering what it would take to get it into Python. On the other hand, I did not have any other particular ambitions, like becoming a Python contributor.
Thanks for the feedback, it's both interesting and valuable. Maybe you
would be willing to add a pointer to this comment to that discussion
on core-mentorship? I'm sure it would be useful information for the
people looking to try to remove barriers to entry. And the point that
you made, that you weren't coming here with an ambition to become a
contributor in any more general sense, is also very relevant, as it's
quite possible that people coming here with nothing more than an idea
that they'd like to propose may well be put off by the feeling that
they have to implement their idea or no-one is willing to listen.
(There *is* in reality a problem in that many ideas are fine but
pointless unless someone implements them, but that doesn't mean we
should block people from just discussing things in the abstract).
...
During the discussion I realized that there were 3 aspects (of the potential acceptable solution), proposed by 3 different persons, about which they were quite imperative:
1) It must use Decimal (Paul)
2) It must check validity of serialized output (Christopher)
3) It must avoid unconditional import of Decimal (Andrew)
A summary like this is immensely helpful in clarifying both where the
discussion has got to, and what the sticking points are. I don't think
we do enough on this list to encourage or offer such summaries, or to
help new contributors to think in terms of checkpointing the
discussion like this. We get so stuck in the technical discussion that
we forget that people may *also* need help in the softer skills, like
managing the discussion. So we keep throwing ideas into the mix until
the contributor is overwhelmed and doesn't know how to proceed.
...
Originally, I thought that I could fulfill 2) and 3), without jeopardizing 1) (my opinion on 1) I already expressed), so I implemented the Python part and run some performance tests only to find out that my solution cannot compete in performance with Decimal solution because of the additional validity check and I could not promote it anymore. I am not particularly convinced that the validity check is really needed, but I understand why others are requesting it.
So the only way to continue seemed to be implementing 1+2+3 and I realized I really did not want to do it. One reason was I did not particularly "like" it, while it is not meant to be read as that I thought it was wrong to do it this way, I just did not really feel invested in those ideas anymore, the other was, that I was no longer able to argue about it,  because I had basically no idea, if the users really need full validity check, or if the cost of one time import of Decimal really overweights the performance hit of the heuristic for a lazy import, and had to rely on what someone claimed on some mailing list (no disrespect meant).
I agree, there's a real risk here that proposals get overwhelmed by
additional requirements suggested by other people. And when those
other people are long-time contributors, or even more so core devs,
it's extremely hard for a newcomer to say that they don't think that
such a requirement is necessary. But as you have demonstrated here,
those requirements are sometimes mutually incompatible, and at some
point, someone has to make a judgement call on the trade-offs. Again,
subjecting a newcomer to the need to do that right up front isn't
exactly fair or helpful.
...
I also realized that implementing this would not give me any advantage over using simplejson, neither in the performance nor in the features, so it lost also the practical aspect of needing it.
Fundamentally, this is where there is a disconnect between people's
expectations occurs. Ideas discussed on this list are intended to be
for implementation in future versions of Python - so it's pretty much
always the case that anything agreed here will be of no immediate use
to the individual proposing it. Historically, Python releases have
been every 18 months, so even if we assume that the proposer is an
extremely early adopter of new releases, and has no need to support
older versions of Python, we're talking about 12-18 months before they
can use a new feature. Compare that to *right now* for a PyPI package
or a workaround in code.

But people *think* of ideas because they hit a problem of their own.
And they come here out of a sense that sharing a possible solution
would help the community. It's not very encouraging if they get
treated as if they are simply saying "solve my problem for me". We
probably need to get better at helping such people to polish their
ideas, *without* focusing too heavily on their original problem (or
worse still, on criticising their original solution of that problem).
The original problem is the initial use case for a new feature, of
course, but focusing on "how does your original problem generalise, so
we can see what common features a solution should have" rather than on
"why do you think your problem is important enough to need solving in
the core language/stdlib" (I exaggerate somewhat, but sadly I suspect
not a lot :-() would be a much more welcoming approach.
...
So I guess I am going to leave my patch on github for a while, if anyone decides to go ahead with 1+2+3. It is not exactly a rocket science but could save some typing, or if you want to run some quick benchmark. If you supply it with dump_as_number=Decimal, it would behave exactly as the version with hardcoded Decimal (sans lazy import). One thing to note, if you choose to use Decimal for validating JSON number, you will need to handle the case where allow_nan is False, and check that Decimal does not serialized them (it does in simplejson as there is no check). Should not have a big impact though as allow_nan is True by default.
Thanks. Even if your PR doesn't ultimately get accepted, the
discussion was useful, and highlighted the fact that we can't
currently write full-precision Decimal values using native JSON (we
can round-trip them using custom encoders and object_hook, but that's
a non-standard layer on top of base JSON). Sometimes these things take
a few rounds of discussion before getting accepted (again, the
long-term view is important here).

Thanks for both the proposal and subsequent thread, and for the
helpful and thought-provoking summary post. Please don't be *too* put
off from coming back with any future ideas you may have!

Paul

PS Your discussion of the 3 constraints people were asking for and how
you viewed them and tried to address them, made me think of some other
possible approaches that might be productive. But as you've said you
don't want to take the proposal further, and I think that's an
entirely reasonable position for you to take, I won't push you by
re-opening the debate right now. But I'll keep the thoughts in mind
for if someone else wants to take the proposal further.

[Python-ideas] Re: adding support for a "raw output" in JSON serializer

Paul Moore