Mailman 3 configparser: should optionxform be idempotent? - Python-Dev

configparser: should optionxform be idempotent?

older
Re: [Python-Dev] PEP 581: Using...

Inada Naoki

7 Mar 2019 7 Mar '19

1:18 a.m.

Hi, all. I came from https://bugs.python.org/issue35838 Since there are no "expert" for configparser in Expert Index, I ask here to make design decision. The default behavior of CofigParser.optionxform is str.lowercase(). This is used to canonicalize option key names. The document of the optionxform shows example overrides it to identity function `lambda option: option`. https://docs.python.org/3/library/configparser.html#configparser.ConfigParse... BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439 But even if all APIs calls optionxform exactly once, user may read option name and value, and write updated value with same name. In this case, user read option name already optionxform-ed (canonicalized). So non-idempotent optionxform will break option name. So what should we do about optionxform? a) Document "optionxform must be idempotent". b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again." I prefer (a) to (b) because it's simple and easy solution. But for some use cases (e.g. read only, write only, use only predefined option name and read only it's value), (b) works. At least issue reporter try this use case and be trapped by this behavior. How do you think? -- Inada Naoki

Show replies by thread

Paul Moore

7 Mar 7 Mar

1:56 a.m.

On Thu, 7 Mar 2019 at 09:21, Inada Naoki wrote:

...

The document of the optionxform shows example overrides it to identity function `lambda option: option`. https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...

BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439

I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".

...

But even if all APIs calls optionxform exactly once, user may read option name and value, and write updated value with same name. In this case, user read option name already optionxform-ed (canonicalized). So non-idempotent optionxform will break option name.

So what should we do about optionxform?

a) Document "optionxform must be idempotent".

b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."

I prefer (a) to (b) because it's simple and easy solution.

I strongly prefer (b). I think the example given in the bug report is a reasonable thing to expect to work. I think that disallowing this usage is an arbitrary restriction that honestly doesn't have a good justification *other* than "it's easier for the implementation". It's obviously not a *common* requirement, otherwise the issue would have come up more often, but it's a reasonable one (after all, we don't require similar functions like the key argument to sorted() to conform to this restriction). I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly. If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction? I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-)) Paul

Inada Naoki

2:06 a.m.

On Thu, Mar 7, 2019 at 6:57 PM Paul Moore wrote:

...

I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".

You're right. "idempotent" is technical (or mathematical) jargon. When f(x) satisfies "f(x) == f(f(x)) for all x" restriction, f(x) is idempotent.

...

I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.

In this case, we recommend not using optionxform to wrap name with "()" implicitly. Use wrapped name explicitly instead. e.g. cfg["section"]["(name)"] = "value" It's very simple. -- Inada Naoki

Karthikeyan

2:30 a.m.

On Thu, Mar 7, 2019 at 2:51 PM Inada Naoki wrote:

...

Hi, all.

I came from https://bugs.python.org/issue35838 Since there are no "expert" for configparser in Expert Index, I ask here to make design decision.

There is lukasz.langa in the expert index for configparser at https://devguide.python.org/experts/#stdlib and that's why I deferred to them. The default behavior of CofigParser.optionxform

...

is str.lowercase(). This is used to canonicalize option key names.

The document of the optionxform shows example overrides it to identity function `lambda option: option`.

https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...

BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439

But even if all APIs calls optionxform exactly once, user may read option name and value, and write updated value with same name. In this case, user read option name already optionxform-ed (canonicalized). So non-idempotent optionxform will break option name.

So what should we do about optionxform?

a) Document "optionxform must be idempotent".

I also feel this is restrictive since wrapping keys with () looks like a valid use case to me. b) Ensure all APIs calls optionxform exactly once, and document

...

"When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."

I prefer (a) to (b) because it's simple and easy solution.

I initially preferred (b) while read_dict is one case. As you have mentioned in the tracker there are various scenarios where the transform is done and stored in the underlying internal dict and then while setting one section key to another it might apply it again. Also I am afraid there is less test coverage for optionxform itself so there could be more scenarios to cover increasing the complexity. -- Regards, Karthikeyan S

Paul Moore

2:30 a.m.

On Thu, 7 Mar 2019 at 10:06, Inada Naoki wrote:

...

On Thu, Mar 7, 2019 at 6:57 PM Paul Moore wrote:

...
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".

You're right. "idempotent" is technical (or mathematical) jargon. When f(x) satisfies "f(x) == f(f(x)) for all x" restriction, f(x) is idempotent.

Thanks. I know what the term means, at least in a mathematical sense - the computing sense is slightly different (in a subtle way that may not be relevant here - see https://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation).

...

...
I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.

In this case, we recommend not using optionxform to wrap name with "()" implicitly. Use wrapped name explicitly instead.

e.g. cfg["section"]["(name)"] = "value"

It's very simple.

That argument could be used for any use of optionxform, though - instead of using the default optionxform, use explicitly-lowercased values everywhere instead. I still prefer option (b), allowing general functions for optionxform. However, I will say (and I should have said in my first mail) that this is a view based purely on theoretical considerations. I've never explicitly used optionxform myself, and none of my code would be impacted in any way regardless of the outcome of this discussion. Paul

Inada Naoki

2:41 a.m.

...

That argument could be used for any use of optionxform, though - instead of using the default optionxform, use explicitly-lowercased values everywhere instead.

It can't be usable if the config format case-insensitive. value = (cfg.get("section", "name") or cfg.get("section", "Name") or cfg.get("section", "nAme") or cfg.get("section", "naMe")...)

...

I still prefer option (b), allowing general functions for optionxform. However, I will say (and I should have said in my first mail) that this is a view based purely on theoretical considerations. I've never explicitly used optionxform myself, and none of my code would be impacted in any way regardless of the outcome of this discussion.

Paul

If we choose (b), I think core developer must check test coverage for optionxform before documenting non-idempotent optionxform is allowed explicitly. I don't have motivation for that because I never used configparser in such way. The PR looks good to me for the particular case the issue describe. So I will merge the PR without updating document when we chose (b). But let's wait a few days for other comments. Regards, -- Inada Naoki

Steven D'Aprano

4:39 a.m.

On Thu, Mar 07, 2019 at 09:56:49AM +0000, Paul Moore wrote:

...

On Thu, 7 Mar 2019 at 09:21, Inada Naoki wrote:

...
The document of the optionxform shows example overrides it to identity function `lambda option: option`. https://docs.python.org/3/library/configparser.html#configparser.ConfigParse...

BPO-35838 is issue about optionxform can be called twice while ConfigParser.read_dict(). If optionxfrom is not idempotent, it creates unexpected option name. https://bugs.python.org/issue35838#msg334439

I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".

That's what "idempotent" means :-) [...]

...

...
So what should we do about optionxform?

a) Document "optionxform must be idempotent".

b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."

I prefer (a) to (b) because it's simple and easy solution.

I strongly prefer (b).

I don't have a strong opinion, but I have a mild preference for taking responsibility for idempotency out of the user's hands if practical. [...]

...

I'd look at the question the other way round. If we *did* insist that optionxform has to be "idempotent", how would we recommend that the person who reported the bug achieved the result he's trying to get? lambda x: x if x.startswith("(") and x.endswith(")") else "(" + x + ")"? That seems a bit fiddly.

Writing idempotent functions often is.

...

If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction?

Why use one word when twenty-four will do? *wink*

...

I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-))

Please, idempotent is a standard term of art, especially for those working with RESTful interfaces. http://restcookbook.com/HTTP%20Methods/idempotency/ It might be obscure to you, but then nearly every jargon term will be obscure to somebody. Nobody is born knowing what terms like multiprocessing threading metaclass decorator comprehension futures etc mean. They're all "obscure jargon" terms to someone. The first time I came across "tuple", I had no idea what it meant (and in fact it took me many years to stop misspelling it "turple"). By all means include a definition of idempotent (perhaps a link to the glossary). But we shouldn't avoid useful, precise terminology because some people haven't come across it yet. -- Steven

Paul Moore

6:39 a.m.

On Thu, 7 Mar 2019 at 12:42, Steven D'Aprano wrote:

...

...
I'm not keen on the term "idempotent" here - I wasn't at all clear what it was intended to convey. But from looking at the bug report, I see that it basically means "optionxform should be a function which, when applied more than one time to a value, returns the same result as if it had been applied once only".

That's what "idempotent" means :-)

Sigh. I'm not interested in an extended debate on fiddly details here. I know that's what "idempotent" means. I said I "wasn't clear", and the context clarified it for me (as would going and looking it up, which I *also* did).There's a subtle difference in the mathematical and computing meanings (around functions with side-effects, which aren't a thing in maths), which IMO makes the term *very slightly* ambiguous - but more to the point, it's uncommon jargon in many areas (searching for "idempotent" in Google shows enough examples of people asking what it means to bear this out, IMO - feel free to disagree).

...

...
If, however, the consensus is that we choose (a), can I ask that we *don't* use the term "idempotent" when documenting the restriction?

Why use one word when twenty-four will do? *wink*

Why use possibly-misunderstood jargon when a clear description will do? Hmm, let me think. I guess it depends on which carefully-worded-to-make-your-point description of the situation you choose to use. *wink*

...

...
I think it will cause too much confusion - we should explain the restriction without using obscure terms (and if it's hard to explain the restriction like that, maybe that demonstrates that it's an unreasonable restriction to impose? ;-))

Please, idempotent is a standard term of art, especially for those working with RESTful interfaces.

http://restcookbook.com/HTTP%20Methods/idempotency/

It might be obscure to you, but then nearly every jargon term will be obscure to somebody.

I didn't say otherwise. I said "I think it will cause too much confusion". I stand by that. I value clear, non-technical, terms over jargon when it's possible to express an idea that way without sacrificing clarity. I believe that in this case this is possible (although as I've already said, I think it's better to avoid the whole question and *not* impose the restriction at all). I have no idea what proportion of readers of the configparser docs will be familiar with REST (or with maths, or with any other context).I doubt you do either, but if you do I'll defer to your knowledge.

...

The first time I came across "tuple", I had no idea what it meant (and in fact it took me many years to stop misspelling it "turple").

I love that, I may start using it deliberately :-)

...

By all means include a definition of idempotent (perhaps a link to the glossary). But we shouldn't avoid useful, precise terminology because some people haven't come across it yet.

Agreed, but we should also express ideas in a way that is as accessible to the general reader as possible. Sometimes that means using (and explaining) precise technical terms, other times it means using simple language that gets the job done, without using *unnecessary* jargon. It's a judgement call as to where the dividing line lies. So I feel that "idempotent" is on one side of the line, and you think it's on the other. We've expressed our opinions, let's leave it at that - I don't want to get into an extended debate over "my experience of what the average user thinks is wider than yours"... Paul

Serhiy Storchaka

8:40 a.m.

07.03.19 11:18, Inada Naoki пише:

...

So what should we do about optionxform?

a) Document "optionxform must be idempotent".

b) Ensure all APIs calls optionxform exactly once, and document "When you get option name from section objects, it is already optionxform-ed. You can not reuse the option name if optionxform is not idempotent, because optionxform will be applied to the name again."

I prefer (a) to (b) because it's simple and easy solution.

I am not expert of configparser, but I prefer (a). The purpose of introducing optionxform was to make lowercasing optional. https://github.com/python/cpython/commit/9e480adf9b3520ea3deb322fd1214f53a22...

Greg Ewing

3:34 p.m.

Inada Naoki wrote:

...

a) Document "optionxform must be idempotent".

...

b) Ensure all APIs calls optionxform exactly once, and document [that it must be idempotent in certain special situations].

I think the question that needs to be asked is whether optionxform is meant to be a canonicalising operation, or a transformation from an external to an internal form. Although the docs don't say so explicitly, it seems clear to me that the author of the module was thinking of it as a transformation to a canonical form. If that's what is intended, then idempotency is pretty much implied, from the definition of what canonical means. If something is already in canonical form, nothing needs to be done. The behaviour with regard to initialising from a dict that was raised in https://bugs.python.org/issue35838#msg334439 is probably an oversight, but there are at least two other ways that the module assumes idempotency: 1) The API does not provide any way of accessing an option *without* applying the transformation. This is a problem if e.g. you want to iterate over the keys and write values back. 2) The write() method writes out the transformed versions of option names. If this output is read back in, the transformation will necessarily be applied a second time. There doesn't seem to be any workaround for this. There may also be other ways in which idempotency is assumed that I haven't thought of. So, rather than try to have the docs list all the things that non-idempotency could break, it would be easier to simply document that idempotency is assumed. -- Greg

Greg Ewing

3:56 p.m.

Paul Moore wrote:

...

There's a subtle difference in the mathematical and computing meanings [of idempotent] (around functions with side-effects, which aren't a thing in maths)

Not really an issue here, since optionxform shouldn't be having side effects if it's sane. In any case, the word is easy enough to avoid in this case. We could say something like: "The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged." -- Greg

Steven D'Aprano

4:12 p.m.

On Fri, Mar 08, 2019 at 12:56:13PM +1300, Greg Ewing wrote:

...

In any case, the word is easy enough to avoid in this case.

I don't think we should avoid using standard terminology even if we can. Knowledge of standard terminology is useful, and we don't generally make a practice of talking about (e.g.) "simultaneously running subtasks" when we can say "threads" instead. You are happy to use the jargon terms "function" and "canonical form" without explanation, which I think proves that one person's jargon is another's obvious, clear, precise technical terminology.

...

We could say something like:

"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."

How about: "The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged." requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win! -- Steven

Inada Naoki

6:51 p.m.

...

...
We could say something like:

"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."

How about:

"The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged."

requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win!

Thank you for suggestions. Personally speaking, I think technical jargon is much easier than normal English idioms or complex English syntax. I learned "idempotent" long ago while learning HTTP. On the other hand, I don't know normal English idioms even 5-year children who speaks English at home knows. Technical jargon is good tool to communicate with people uses English only for programming. It shows the meaning very clearly with few words. So I agree with you. If reader may not know tech jargon widely used, teach it instead of avoid it. Regards, -- Inada Naoki

Paul Moore

11:30 p.m.

On Thu, 7 Mar 2019 at 23:58, Greg Ewing wrote:

...

Paul Moore wrote:

...
There's a subtle difference in the mathematical and computing meanings [of idempotent] (around functions with side-effects, which aren't a thing in maths)

Not really an issue here, since optionxform shouldn't be having side effects if it's sane.

In any case, the word is easy enough to avoid in this case. We could say something like:

"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."

Precisely. +1 on this wording if we choose to go this way. If someone *really* wants to link the idea to the term "idempotent" then a simple "(i.e., the function must be idempotent)" would be sufficient to confirm for people who know the term, avoid making it unclear for people who don't, and teach people who don't the meaning of the term. Paul

Paul Moore

11:33 p.m.

On Fri, 8 Mar 2019 at 02:54, Inada Naoki wrote:

...

Personally speaking, I think technical jargon is much easier than normal English idioms or complex English syntax.

I learned "idempotent" long ago while learning HTTP. On the other hand, I don't know normal English idioms even 5-year children who speaks English at home knows.

Technical jargon is good tool to communicate with people uses English only for programming. It shows the meaning very clearly with few words.

Thanks for the reminder that the trade-off is different for non-English speakers - and my apologies for not taking that into account.

...

So I agree with you. If reader may not know tech jargon widely used, teach it instead of avoid it.

Your point is taken, let's include the term in the explanation, but let's also spell out the behaviour for people who don't know the term (or who simply find it less familiar because it's not something commonly used in areas they work in). Paul

Eric V. Smith

8 Mar 8 Mar

12:26 a.m.

...

On Mar 8, 2019, at 12:12 AM, Steven D'Aprano wrote:

...
On Fri, Mar 08, 2019 at 12:56:13PM +1300, Greg Ewing wrote:

In any case, the word is easy enough to avoid in this case.

I don't think we should avoid using standard terminology even if we can. Knowledge of standard terminology is useful, and we don't generally make a practice of talking about (e.g.) "simultaneously running subtasks" when we can say "threads" instead.

You are happy to use the jargon terms "function" and "canonical form" without explanation, which I think proves that one person's jargon is another's obvious, clear, precise technical terminology.

...
We could say something like:

"The optionxform function transforms option names to a canonical form. If the name is already in canonical form, it should be returned unchanged."

How about:

"The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged."

I’d prefer something less passive than “it should remain unchanged” (as my high school English teacher would say: “by whom?”). Something like “If optionxform is called on a name that is already in canonical form, then it should return that name unchanged”. Then add something like “That is, optionxform should be idempotent”. Eric

...

requires six extra words, but it uses the correct technical term which will be familiar to some proportion of users, while also explaining the term for those who aren't familiar with it. We all win!

-- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40tru...

Steve Dower

7:58 a.m.

I just have one thing to add to the discussion, which is this: https://youtu.be/hAnCiTpxXPg?t=6339 Yes, people actually read and interpret our documentation :) So discussions like this are probably a good thing in terms of getting the best descriptions in there, but if we use a specific technical term *not quite right* then we will be found out. (A few of us core devs chatted with Al afterwards and there's no bad blood, so don't worry about that.) Cheers, Steve

1869

Age (days ago)

1870

Last active (days ago)

List overview

Download

16 comments

8 participants

participants (8)

Eric V. Smith
Greg Ewing
Inada Naoki
Karthikeyan
Paul Moore
Serhiy Storchaka
Steve Dower
Steven D'Aprano

configparser: should optionxform be idempotent?

tags

participants (8)