configparser: should optionxform be idempotent?
Hi, all.
I came from https://bugs.python.org/issue35838
Since there are no "expert" for configparser in
Expert Index, I ask here to make design decision.
The default behavior of CofigParser.optionxform
is str.lowercase(). This is used to canonicalize
option key names.
The document of the optionxform shows example
overrides it to identity function `lambda option: option`.
https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...
BPO-35838 is issue about optionxform can be called twice
while ConfigParser.read_dict().
If optionxfrom is not idempotent, it creates unexpected option
name.
https://bugs.python.org/issue35838#msg334439
But even if all APIs calls optionxform exactly once, user may
read option name and value, and write updated value with same name.
In this case, user read option name already optionxform-ed
(canonicalized). So non-idempotent optionxform will break
option name.
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document
"When you get option name from section objects, it is already
optionxform-ed. You can not reuse the option name if
optionxform is not idempotent, because optionxform will be
applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
But for some use cases (e.g. read only, write only, use only
predefined option name and read only it's value), (b) works.
At least issue reporter try this use case and be trapped by
this behavior.
How do you think?
--
Inada Naoki
On Thu, 7 Mar 2019 at 09:21, Inada Naoki
The document of the optionxform shows example overrides it to identity function `lambda option: option`. https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...
BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".
But even if all APIs calls optionxform exactly once, user may read option name and value, and write updated value with same name. In this case, user read option name already optionxform-ed (canonicalized). So non-idempotent optionxform will break option name.
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
I strongly prefer (b). I think the example given in the bug report is a reasonable thing to expect to work. I think that disallowing this usage is an arbitrary restriction that honestly doesn't have a good justification *other* than "it's easier for the implementation". It's obviously not a *common* requirement, otherwise the issue would have come up more often, but it's a reasonable one (after all, we don't require similar functions like the key argument to sorted() to conform to this restriction). I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly. If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction? I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-)) Paul
On Thu, Mar 7, 2019 at 6:57 PM Paul Moore
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".
You're right. "idempotent" is technical (or mathematical) jargon. When f(x) satisfies "f(x) == f(f(x)) for all x" restriction, f(x) is idempotent.
I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.
In this case, we recommend not using optionxform to wrap name with
"()" implicitly. Use wrapped name explicitly instead.
e.g. cfg["section"]["(name)"] = "value"
It's very simple.
--
Inada Naoki
On Thu, Mar 7, 2019 at 2:51 PM Inada Naoki
Hi, all.
I came from https://bugs.python.org/issue35838 Since there are no "expert" for configparser in Expert Index, I ask here to make design decision.
There is lukasz.langa in the expert index for configparser at https://devguide.python.org/experts/#stdlib and that's why I deferred to them. The default behavior of CofigParser.optionxform
is str.lowercase(). This is used to canonicalize option key names.
The document of the optionxform shows example overrides it to identity function `lambda option: option`.
https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...
BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439
But even if all APIs calls optionxform exactly once, user may read option name and value, and write updated value with same name. In this case, user read option name already optionxform-ed (canonicalized). So non-idempotent optionxform will break option name.
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
I also feel this is restrictive since wrapping keys with () looks like a valid use case to me. b) Ensure all APIs calls optionxform exactly once, and document
"When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
I initially preferred (b) while read_dict is one case. As you have mentioned in the tracker there are various scenarios where the transform is done and stored in the underlying internal dict and then while setting one section key to another it might apply it again. Also I am afraid there is less test coverage for optionxform itself so there could be more scenarios to cover increasing the complexity. -- Regards, Karthikeyan S
On Thu, 7 Mar 2019 at 10:06, Inada Naoki
On Thu, Mar 7, 2019 at 6:57 PM Paul Moore
wrote: I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".
You're right. "idempotent" is technical (or mathematical) jargon. When f(x) satisfies "f(x) == f(f(x)) for all x" restriction, f(x) is idempotent.
Thanks. I know what the term means, at least in a mathematical sense - the computing sense is slightly different (in a subtle way that may not be relevant here - see https://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation).
I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.
In this case, we recommend not using optionxform to wrap name with "()" implicitly. Use wrapped name explicitly instead.
e.g. cfg["section"]["(name)"] = "value"
It's very simple.
That argument could be used for any use of optionxform, though - instead of using the default optionxform, use explicitly-lowercased values everywhere instead. I still prefer option (b), allowing general functions for optionxform. However, I will say (and I should have said in my first mail) that this is a view based purely on theoretical considerations. I've never explicitly used optionxform myself, and none of my code would be impacted in any way regardless of the outcome of this discussion. Paul
That argument could be used for any use of optionxform, though - instead of using the default optionxform, use explicitly-lowercased values everywhere instead.
It can't be usable if the config format case-insensitive. value = (cfg.get("section", "name") or cfg.get("section", "Name") or cfg.get("section", "nAme") or cfg.get("section", "naMe")...)
I still prefer option (b), allowing general functions for optionxform. However, I will say (and I should have said in my first mail) that this is a view based purely on theoretical considerations. I've never explicitly used optionxform myself, and none of my code would be impacted in any way regardless of the outcome of this discussion.
Paul
If we choose (b), I think core developer must check test coverage for
optionxform before documenting non-idempotent optionxform
is allowed explicitly.
I don't have motivation for that because I never used configparser in such way.
The PR looks good to me for the particular case the issue describe.
So I will merge the PR without updating document when we chose (b).
But let's wait a few days for other comments.
Regards,
--
Inada Naoki
On Thu, Mar 07, 2019 at 09:56:49AM +0000, Paul Moore wrote:
On Thu, 7 Mar 2019 at 09:21, Inada Naoki
wrote: The document of the optionxform shows example overrides it to identity function `lambda option: option`. https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...
BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".
That's what "idempotent" means :-) [...]
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
I strongly prefer (b).
I don't have a strong opinion, but I have a mild preference for taking responsibility for idempotency out of the user's hands if practical. [...]
I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.
Writing idempotent functions often is.
If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction?
Why use one word when twenty-four will do? *wink*
I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-))
Please, idempotent is a standard term of art, especially for those working with RESTful interfaces. http://restcookbook.com/HTTP%20Methods/idempotency/ It might be obscure to you, but then nearly every jargon term will be obscure to somebody. Nobody is born knowing what terms like multiprocessing threading metaclass decorator comprehension futures etc mean. They're all "obscure jargon" terms to someone. The first time I came across "tuple", I had no idea what it meant (and in fact it took me many years to stop misspelling it "turple"). By all means include a definition of idempotent (perhaps a link to the glossary). But we shouldn't avoid useful, precise terminology because some people haven't come across it yet. -- Steven
On Thu, 7 Mar 2019 at 12:42, Steven D'Aprano
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".
That's what "idempotent" means :-)
Sigh. I'm not interested in an extended debate on fiddly details here. I know that's what "idempotent" means. I said I "wasn't clear", and the context clarified it for me (as would going and looking it up, which I *also* did).There's a subtle difference in the mathematical and computing meanings (around functions with side-effects, which aren't a thing in maths), which IMO makes the term *very slightly* ambiguous - but more to the point, it's uncommon jargon in many areas (searching for "idempotent" in Google shows enough examples of people asking what it means to bear this out, IMO - feel free to disagree).
If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction?
Why use one word when twenty-four will do? *wink*
Why use possibly-misunderstood jargon when a clear description will do? Hmm, let me think. I guess it depends on which carefully-worded-to-make-your-point description of the situation you choose to use. *wink*
I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-))
Please, idempotent is a standard term of art, especially for those working with RESTful interfaces.
http://restcookbook.com/HTTP%20Methods/idempotency/
It might be obscure to you, but then nearly every jargon term will be obscure to somebody.
I didn't say otherwise. I said "I think it will cause too much confusion". I stand by that. I value clear, non-technical, terms over jargon when it's possible to express an idea that way without sacrificing clarity. I believe that in this case this is possible (although as I've already said, I think it's better to avoid the whole question and *not* impose the restriction at all). I have no idea what proportion of readers of the configparser docs will be familiar with REST (or with maths, or with any other context).I doubt you do either, but if you do I'll defer to your knowledge.
The first time I came across "tuple", I had no idea what it meant (and in fact it took me many years to stop misspelling it "turple").
I love that, I may start using it deliberately :-)
By all means include a definition of idempotent (perhaps a link to the glossary). But we shouldn't avoid useful, precise terminology because some people haven't come across it yet.
Agreed, but we should also express ideas in a way that is as accessible to the general reader as possible. Sometimes that means using (and explaining) precise technical terms, other times it means using simple language that gets the job done, without using *unnecessary* jargon. It's a judgement call as to where the dividing line lies. So I feel that "idempotent" is on one side of the line, and you think it's on the other. We've expressed our opinions, let's leave it at that - I don't want to get into an extended debate over "my experience of what the average user thinks is wider than yours"... Paul
07.03.19 11:18, Inada Naoki пише:
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
I am not expert of configparser, but I prefer (a). The purpose of introducing optionxform was to make lowercasing optional. https://github.com/python/cpython/commit/9e480adf9b3520ea3deb322fd1214f53a22...
Inada Naoki wrote:
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document [that it must be idempotent in certain special situations].
I think the question that needs to be asked is whether optionxform is meant to be a canonicalising operation, or a transformation from an external to an internal form. Although the docs don't say so explicitly, it seems clear to me that the author of the module was thinking of it as a transformation to a canonical form. If that's what is intended, then idempotency is pretty much implied, from the definition of what canonical means. If something is already in canonical form, nothing needs to be done. The behaviour with regard to initialising from a dict that was raised in https://bugs.python.org/issue35838#msg334439 is probably an oversight, but there are at least two other ways that the module assumes idempotency: 1) The API does not provide any way of accessing an option *without* applying the transformation. This is a problem if e.g. you want to iterate over the keys and write values back. 2) The write() method writes out the transformed versions of option names. If this output is read back in, the transformation will necessarily be applied a second time. There doesn't seem to be any workaround for this. There may also be other ways in which idempotency is assumed that I haven't thought of. So, rather than try to have the docs list all the things that non-idempotency could break, it would be easier to simply document that idempotency is assumed. -- Greg
Paul Moore wrote:
There's a subtle difference in the mathematical and computing meanings [of idempotent] (around functions with side-effects, which aren't a thing in maths)
Not really an issue here, since optionxform shouldn't be having side effects if it's sane. In any case, the word is easy enough to avoid in this case. We could say something like: "The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged." -- Greg
On Fri, Mar 08, 2019 at 12:56:13PM +1300, Greg Ewing wrote:
In any case, the word is easy enough to avoid in this case.
I don't think we should avoid using standard terminology even if we can. Knowledge of standard terminology is useful, and we don't generally make a practice of talking about (e.g.) "simultaneously running subtasks" when we can say "threads" instead. You are happy to use the jargon terms "function" and "canonical form" without explanation, which I think proves that one person's jargon is another's obvious, clear, precise technical terminology.
We could say something like:
"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."
How about: "The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged." requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win! -- Steven
We could say something like:
"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."
How about:
"The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged."
requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win!
Thank you for suggestions.
Personally speaking, I think technical jargon is much easier than
normal English idioms or complex English syntax.
I learned "idempotent" long ago while learning HTTP. On the other hand,
I don't know normal English idioms even 5-year children who speaks
English at home knows.
Technical jargon is good tool to communicate with people uses English
only for programming. It shows the meaning very clearly with few words.
So I agree with you. If reader may not know tech jargon widely used,
teach it instead of avoid it.
Regards,
--
Inada Naoki
On Thu, 7 Mar 2019 at 23:58, Greg Ewing
Paul Moore wrote:
There's a subtle difference in the mathematical and computing meanings [of idempotent] (around functions with side-effects, which aren't a thing in maths)
Not really an issue here, since optionxform shouldn't be having side effects if it's sane.
In any case, the word is easy enough to avoid in this case. We could say something like:
"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."
Precisely. +1 on this wording if we choose to go this way. If someone *really* wants to link the idea to the term "idempotent" then a simple "(i.e., the function must be idempotent)" would be sufficient to confirm for people who know the term, avoid making it unclear for people who don't, and teach people who don't the meaning of the term. Paul
On Fri, 8 Mar 2019 at 02:54, Inada Naoki
Personally speaking, I think technical jargon is much easier than normal English idioms or complex English syntax.
I learned "idempotent" long ago while learning HTTP. On the other hand, I don't know normal English idioms even 5-year children who speaks English at home knows.
Technical jargon is good tool to communicate with people uses English only for programming. It shows the meaning very clearly with few words.
Thanks for the reminder that the trade-off is different for non-English speakers - and my apologies for not taking that into account.
So I agree with you. If reader may not know tech jargon widely used, teach it instead of avoid it.
Your point is taken, let's include the term in the explanation, but let's also spell out the behaviour for people who don't know the term (or who simply find it less familiar because it's not something commonly used in areas they work in). Paul
On Mar 8, 2019, at 12:12 AM, Steven D'Aprano
wrote: On Fri, Mar 08, 2019 at 12:56:13PM +1300, Greg Ewing wrote:
In any case, the word is easy enough to avoid in this case.
I don't think we should avoid using standard terminology even if we can. Knowledge of standard terminology is useful, and we don't generally make a practice of talking about (e.g.) "simultaneously running subtasks" when we can say "threads" instead.
You are happy to use the jargon terms "function" and "canonical form" without explanation, which I think proves that one person's jargon is another's obvious, clear, precise technical terminology.
We could say something like:
"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."
How about:
"The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged."
I’d prefer something less passive than “it should remain unchanged” (as my high school English teacher would say: “by whom?”). Something like “If optionxform is called on a name that is already in canonical form, then it should return that name unchanged”. Then add something like “That is, optionxform should be idempotent”. Eric
requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win!
-- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...
I just have one thing to add to the discussion, which is this: https://youtu.be/hAnCiTpxXPg?t=6339 Yes, people actually read and interpret our documentation :) So discussions like this are probably a good thing in terms of getting the best descriptions in there, but if we use a specific technical term *not quite right* then we will be found out. (A few of us core devs chatted with Al afterwards and there's no bad blood, so don't worry about that.) Cheers, Steve
participants (8)
-
Eric V. Smith
-
Greg Ewing
-
Inada Naoki
-
Karthikeyan
-
Paul Moore
-
Serhiy Storchaka
-
Steve Dower
-
Steven D'Aprano