ConfigParser mangles keys with special chars

Hi, I noticed configparser does behave in a surprising way when a key has a special meaning in ini-format. Consider this example: >>> cp = configparser.ConfigParser() >>> cp.read_dict({'DEFAULT': {';foo': 'bar'}}) >>> cp.write(sys.stdout) [DEFAULT] ;foo = bar Now when reading this again, ";foo = bar" will be a comment and ignored. There's probably similiar behaviour in other corner cases (like if you'd use "[foo]" as key for example). While it seems ConfigParser doesn't do any escaping as all, I'm thinking it should at least raise some exception when such a value is trying to be set. I'd expect writing something and then reading it back via the same configparser to *always* result in the same data, as long as writing worked without error. Thoughts? Should I submit a bug report? Florian -- () ascii ribbon campaign - stop html mail www.asciiribbon.org /\ www.the-compiler.org | I love long mails http://email.is-not-s.ms/ To give happiness is to deserve happiness.

On Fri, Apr 25, 2014 at 10:22 AM, Florian Bruhin <me@the-compiler.org> wrote:
I believe you should, if only to provide a place to record why no change gets made. Had ConfigParser been more careful from the beginning, that would have been really good. At this point, it would be a backward-incompatible change, so it's unlikely such a change could be allowed to affect existing code. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On 4/25/2014 12:46 PM, Fred Drake wrote:
When you do, add lukasz.langa as nosy https://docs.python.org/devguide/experts.html#experts You might also take a look in test/test_configparser.py to see if any edge cases are tested for.
-- Terry Jan Reedy

On 04/25/2014 09:46 AM, Fred Drake wrote:
At this point, it would be a backward-incompatible change, so it's unlikely such a change could be allowed to affect existing code.
All bug-fixes are backwards-incompatible, yet we fix them anyway. ;) It seems to me the real question is do we fix it in 3.5 only, or can we fix it in 3.4 and previous? And the answer depends on whether this behavior can be reasonably relied on. -- ~Ethan~

On 4/25/2014 1:41 PM, Ethan Furman wrote:
And that depends on whether the current thought-to-be-buggy behavior is specified by the doc, compatible with an ambiguous or under-specified doc, or prohibited by the doc (because the doc specified something else). I leave it to someone to carefully read the doc, but a brief glance indicates "There are nearly as many INI format variants as there are applications using it. configparser goes a long way to provide support for the largest sensible set of INI styles available." So I wonder whether the thought-to-be-buggy behavior is actually buggy with respect to *all* the supported styles or just some of them. -- Terry Jan Reedy

On Fri, Apr 25, 2014 at 2:45 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Given that most variants are completely undocumented, answering this is sufficiently intractable for me to call it intractable. We've also run into compatibility issues because some applications treated the "parser" object as an in-memory configuration database throughout the process lifetime, updating values with non-string values of various sorts. Given the age of the module, the existing number of uses of undocumented accidents of implementation is very large. Fixing "bugs" like this has an excellent track record of uncovering these uses, so... we've grown a bit wary. It's not unheard of for projects to fork their own copies of configparser because of changes to the stdlib version. (No, I haven't done a census of such cases, but I do know of at least one in a popular package.) -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Fri, Apr 25, 2014 at 3:12 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
You can subclass the parser class you're using and override the ``optionxform`` method to perform the checks you want for option names. I don't know if current versions provide a similar hook to validate section names; providing *that* would be an excellent enhancement. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Apr 25, 2014, at 11:47 AM, Terry Reedy <tjreedy@udel.edu> wrote:
So I wonder whether the thought-to-be-buggy behavior is actually buggy with respect to *all* the supported styles or just some of them.
I can't how being able to write a file that isn't read back with the same meaning could be anything but a bug (or missing feature). Regardless of what ( lack of ) standard it is or isn't conforming to. Also, if it is often used just to read ini files, then those uses wouldn't be effected by more strictness on writing ... But the way, last I checked elmentree had a similar problem -- it would quite happily write invalid XML that it wouldn't read back in... -Chris

On Fri, Apr 25, 2014 at 12:46:42PM -0400, Fred Drake wrote:
It seems to me that the current behaviour is a bug and should be fixed. I think the relevant part of the docs is the part following the "comment_prefixes" and "inline_comment_prefixes" section: Please note that config parsers don’t support escaping of comment prefixes so using inline_comment_prefixes may prevent users from specifying option values with characters used as comment prefixes. When in doubt, avoid setting inline_comment_prefixes. In any circumstances, the only way of storing comment prefix characters at the beginning of a line in multiline values is to interpolate the prefix, for example: ... This is *seriously* underspecified, which is a bug in itself: option *values* don't support including inline_comment_prefixes, but do option *keys* support them? How about keys beginning with comment_prefixes? The specification doesn't say, but nor does it say that it's undefined behaviour. I think that a line beginning with "#spam" is ambiguous, it isn't clear if it is intended as a comment "spam" or a key starting with #, so by the Zen, configparser should refuse to guess. -- Steven

On Sat, Apr 26, 2014 at 1:34 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Seriously? Perhaps the second paragraph here could be strengthened a little: https://docs.python.org/3/library/configparser.html#supported-ini-file-struc... But that seems clear to me. Lines starting with the comment prefix are comments. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Sat, Apr 26, 2014 at 01:59:27AM -0400, Fred Drake wrote:
Perhaps I could have worded my post more clearly, but yes. In context, we were talking about a situation where the user creates a key value pair. Copying from the OP's initial post: >>> cp = configparser.ConfigParser() >>> cp.read_dict({'DEFAULT': {';foo': 'bar'}}) >>> cp.write(sys.stdout) [DEFAULT] ;foo = bar The intent of the creator of the ConfigParser was for a key ";foo" with value "bar", and that is the way it is treated while still in memory. It's only when reading it back from a file does it become treated as a comment. There's no way to tell whether the line ";foo = bar" *in a file* came from a key ";foo" with value "bar", or from a genuine comment that merely happens to look like key=value. The obvious way to avoid the ambiguity is to disallow keys beginning with a comment prefix. Then ";foo = bar" cannot represent the key ";foo", because such a key is illegal.
But the entry in question wasn't a line, it was a key=value pair in a dict. Here's that line again: cp.read_dict({'DEFAULT': {';foo': 'bar'}}) or it could have been: cp['DEFAULT'][';foo'] = 'bar' Either way, if there's no way to escape comment characters, I think it is a bug that you can insert illegal keys into the cp object in the first place. -- Steven

On Sat, Apr 26, 2014 at 7:23 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Fair enough. I'm not trying to argue that it isn't a bug, but that risk of breaking currently-working applications with data they consider acceptable is high. Many use configparser for input only, and make no use of the serialization feature. Those applications can be broken by adding a constraint on the data that's allowed, regardless of what we think of their keys. Chaning this in an application for which it's safe (easier to know at the application level) is easy enough: import configparser import re class ProtectionistParser(configparser.RawConfigParser): _rx = re.compile(r"[a-z]([-a-z]*[a-z])?$") # or whatever makes sense for your app def optionxform(self, opt): if self._rx.match(opt) is None: raise ValueError("don't like this: %r" % opt) return opt Maybe the "strict" mode is considered new enough that this isn't as significant a risk, and it could be allowed when that's enabled; I'm sure Łukasz will have a carefully considered opinion (and I'll defer to him in the end). -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

* Florian Bruhin <me@the-compiler.org> [2014-04-25 16:22:06 +0200]:
I noticed configparser does behave in a surprising way when a key has a special meaning in ini-format.
I've now submitted an issue here: http://bugs.python.org/issue21498 Florian -- () ascii ribbon campaign - stop html mail www.asciiribbon.org /\ www.the-compiler.org | I love long mails http://email.is-not-s.ms/ Blessed are the forgetful: for they get the better even of their blunders. -- Nietzsche

On Fri, Apr 25, 2014 at 10:22 AM, Florian Bruhin <me@the-compiler.org> wrote:
I believe you should, if only to provide a place to record why no change gets made. Had ConfigParser been more careful from the beginning, that would have been really good. At this point, it would be a backward-incompatible change, so it's unlikely such a change could be allowed to affect existing code. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On 4/25/2014 12:46 PM, Fred Drake wrote:
When you do, add lukasz.langa as nosy https://docs.python.org/devguide/experts.html#experts You might also take a look in test/test_configparser.py to see if any edge cases are tested for.
-- Terry Jan Reedy

On 04/25/2014 09:46 AM, Fred Drake wrote:
At this point, it would be a backward-incompatible change, so it's unlikely such a change could be allowed to affect existing code.
All bug-fixes are backwards-incompatible, yet we fix them anyway. ;) It seems to me the real question is do we fix it in 3.5 only, or can we fix it in 3.4 and previous? And the answer depends on whether this behavior can be reasonably relied on. -- ~Ethan~

On 4/25/2014 1:41 PM, Ethan Furman wrote:
And that depends on whether the current thought-to-be-buggy behavior is specified by the doc, compatible with an ambiguous or under-specified doc, or prohibited by the doc (because the doc specified something else). I leave it to someone to carefully read the doc, but a brief glance indicates "There are nearly as many INI format variants as there are applications using it. configparser goes a long way to provide support for the largest sensible set of INI styles available." So I wonder whether the thought-to-be-buggy behavior is actually buggy with respect to *all* the supported styles or just some of them. -- Terry Jan Reedy

On Fri, Apr 25, 2014 at 2:45 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Given that most variants are completely undocumented, answering this is sufficiently intractable for me to call it intractable. We've also run into compatibility issues because some applications treated the "parser" object as an in-memory configuration database throughout the process lifetime, updating values with non-string values of various sorts. Given the age of the module, the existing number of uses of undocumented accidents of implementation is very large. Fixing "bugs" like this has an excellent track record of uncovering these uses, so... we've grown a bit wary. It's not unheard of for projects to fork their own copies of configparser because of changes to the stdlib version. (No, I haven't done a census of such cases, but I do know of at least one in a popular package.) -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Fri, Apr 25, 2014 at 3:12 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
You can subclass the parser class you're using and override the ``optionxform`` method to perform the checks you want for option names. I don't know if current versions provide a similar hook to validate section names; providing *that* would be an excellent enhancement. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Apr 25, 2014, at 11:47 AM, Terry Reedy <tjreedy@udel.edu> wrote:
So I wonder whether the thought-to-be-buggy behavior is actually buggy with respect to *all* the supported styles or just some of them.
I can't how being able to write a file that isn't read back with the same meaning could be anything but a bug (or missing feature). Regardless of what ( lack of ) standard it is or isn't conforming to. Also, if it is often used just to read ini files, then those uses wouldn't be effected by more strictness on writing ... But the way, last I checked elmentree had a similar problem -- it would quite happily write invalid XML that it wouldn't read back in... -Chris

On Fri, Apr 25, 2014 at 12:46:42PM -0400, Fred Drake wrote:
It seems to me that the current behaviour is a bug and should be fixed. I think the relevant part of the docs is the part following the "comment_prefixes" and "inline_comment_prefixes" section: Please note that config parsers don’t support escaping of comment prefixes so using inline_comment_prefixes may prevent users from specifying option values with characters used as comment prefixes. When in doubt, avoid setting inline_comment_prefixes. In any circumstances, the only way of storing comment prefix characters at the beginning of a line in multiline values is to interpolate the prefix, for example: ... This is *seriously* underspecified, which is a bug in itself: option *values* don't support including inline_comment_prefixes, but do option *keys* support them? How about keys beginning with comment_prefixes? The specification doesn't say, but nor does it say that it's undefined behaviour. I think that a line beginning with "#spam" is ambiguous, it isn't clear if it is intended as a comment "spam" or a key starting with #, so by the Zen, configparser should refuse to guess. -- Steven

On Sat, Apr 26, 2014 at 1:34 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Seriously? Perhaps the second paragraph here could be strengthened a little: https://docs.python.org/3/library/configparser.html#supported-ini-file-struc... But that seems clear to me. Lines starting with the comment prefix are comments. -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

On Sat, Apr 26, 2014 at 01:59:27AM -0400, Fred Drake wrote:
Perhaps I could have worded my post more clearly, but yes. In context, we were talking about a situation where the user creates a key value pair. Copying from the OP's initial post: >>> cp = configparser.ConfigParser() >>> cp.read_dict({'DEFAULT': {';foo': 'bar'}}) >>> cp.write(sys.stdout) [DEFAULT] ;foo = bar The intent of the creator of the ConfigParser was for a key ";foo" with value "bar", and that is the way it is treated while still in memory. It's only when reading it back from a file does it become treated as a comment. There's no way to tell whether the line ";foo = bar" *in a file* came from a key ";foo" with value "bar", or from a genuine comment that merely happens to look like key=value. The obvious way to avoid the ambiguity is to disallow keys beginning with a comment prefix. Then ";foo = bar" cannot represent the key ";foo", because such a key is illegal.
But the entry in question wasn't a line, it was a key=value pair in a dict. Here's that line again: cp.read_dict({'DEFAULT': {';foo': 'bar'}}) or it could have been: cp['DEFAULT'][';foo'] = 'bar' Either way, if there's no way to escape comment characters, I think it is a bug that you can insert illegal keys into the cp object in the first place. -- Steven

On Sat, Apr 26, 2014 at 7:23 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Fair enough. I'm not trying to argue that it isn't a bug, but that risk of breaking currently-working applications with data they consider acceptable is high. Many use configparser for input only, and make no use of the serialization feature. Those applications can be broken by adding a constraint on the data that's allowed, regardless of what we think of their keys. Chaning this in an application for which it's safe (easier to know at the application level) is easy enough: import configparser import re class ProtectionistParser(configparser.RawConfigParser): _rx = re.compile(r"[a-z]([-a-z]*[a-z])?$") # or whatever makes sense for your app def optionxform(self, opt): if self._rx.match(opt) is None: raise ValueError("don't like this: %r" % opt) return opt Maybe the "strict" mode is considered new enough that this isn't as significant a risk, and it could be allowed when that's enabled; I'm sure Łukasz will have a carefully considered opinion (and I'll defer to him in the end). -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

* Florian Bruhin <me@the-compiler.org> [2014-04-25 16:22:06 +0200]:
I noticed configparser does behave in a surprising way when a key has a special meaning in ini-format.
I've now submitted an issue here: http://bugs.python.org/issue21498 Florian -- () ascii ribbon campaign - stop html mail www.asciiribbon.org /\ www.the-compiler.org | I love long mails http://email.is-not-s.ms/ Blessed are the forgetful: for they get the better even of their blunders. -- Nietzsche
participants (6)
-
Chris Barker - NOAA Federal
-
Ethan Furman
-
Florian Bruhin
-
Fred Drake
-
Steven D'Aprano
-
Terry Reedy