Preserving **kwargs order (was: Re: OrderedDict literals)

On Wed, Mar 19, 2014 at 4:01 PM, <random832@fastmail.us> wrote:
I'm not talking about the ** unpacking syntax, but about the function definition syntax for **kwargs: def spam(a, b, **kwargs): ...
Hopefully I'll have time to write a proto-PEP on this in the next couple weeks, but the gist is that I see 2 options: 1. Change kwargs to an OrderedDict for functions decorated a special decorator: @preserve_kwargs_order def spam(a, b, **kwargs): ... # kwargs will be an OrderedDict 2. Store the ordered keys in a list and bind that to a special local variable, but only for functions that have **kwargs: def spam(a, b, **kwargs): kwargs = OrderedDict((k, kwargs[k]) for k in __kwargs_order__) ... My gut feeling is that option 2 is the better choice. And before you ask, Guido has already vetoed making **kwargs always be an OrderedDict. There are a few other details to work out, but the above is the main thing. Also, an OrderedDict implemented in C will likely be a prerequisite, but luckily I've had a patch up since July(?) (#16991). It doesn't apply cleanly at present, but I expect that's only barely and I should be able to get a clean patch up when I have a chance. There wasn't a lot of interest in reviewing such a large patch so I'd back-burnered it until Python 3.4 got released (which just happened). Once I get a couple reviews I should be able to get that in (wink wink nudge nudge). -eric

On Thu, Mar 20, 2014 at 10:32 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Strongly against any proposal that makes argument passing behave differently based on the target - it means the process of building up the arguments has to check some attribute on the function. Order of kwargs is a problem for any function that passes args through to another function. def wrap_spam(*args, **kwargs): try: return spam(*args, **kwargs) except EggsError: return False Does wrap_spam have to be aware that spam cares about keyword argument order? Should it preserve order "just in case"? Should the __kwargs_order__ list be automatically propagated? And what happens if the args aren't passed through perfectly, but some are added/removed? ChrisA

On Mar 19, 2014, at 16:32, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hopefully I'll have time to write a proto-PEP on this in the next couple weeks, but the gist is that I see 2 options:
Last time around, someone suggested that ***kwargs could get you the args as a list of pairs, or maybe an OrderedDict. While that would require a bit of complexity in the CALL_FUNCTION code, it seems like it would be simpler than your option 2. But it doesn't solve the main problem. Right now, you can forward any arguments perfectly by doing this: def wrapper(*args, **kwargs): return wrappee(*args, **kwargs) Your option 2 would require much more verbose code to forward perfectly. The triple-star idea makes it a lot nicer (assuming that ** passing respects the order, or a new *** is added there as well), but it would still break the thousands of wrapper functions out there written with **kwargs.

On 2014-03-20 01:58, Andrew Barnert wrote:> On Mar 19, 2014, at 16:32, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Wouldn't it be a problem only if the dict were unpacked and then repacked? In the code above, it's merely passing the input kwargs on to 'wrappee'.

Somehow it feels wrong to add a fairly complex new feature to the language (namely, ordered keyword arguments) to handle the very specific case of ordered dict constructors (or is there some other (clearly) different use intended?). If anything, I prefer the abuse of "slicing" syntax (so that one can write "od['a': 'b', 'c': 'd']" which is not much worse than "{'a': 'b', 'c': 'd'}"). Let's say we start with def wrapper(**kwargs): <some code that manipulates kwargs> return g(**kwargs) def g(**kwargs): pass wrapper(a=1, b=2) Nothing requires ordered keyword arguments, wrapper doesn't bother saving the argument order, everything is fine (but because there is some code that manipulates kwargs, an actual dict must be created, not just some sort of proxy)... now, just after wrapper is executed and before g gets called, globals()["g"] gets changed (by another thread...) to a function that does require ordered keyword arguments. What now? Antony 2014-03-19 19:42 GMT-07:00 MRAB <python@mrabarnett.plus.com>:

From: MRAB <python@mrabarnett.plus.com> Sent: Wednesday, March 19, 2014 7:42 PM
No, a **kwargs parameter packs the keyword arguments into a dict, and a **kwargs argument unpacks a dict into the keyword arguments; that's exactly what they do. And dicts have arbitrary order. Besides, the whole point of option 2 is that kwargs is still just a plain dict, and the order is passed via a magic hidden parameter __kwargs_order__. So, even if you _were_ just passing kwargs on without doing anything to it, that still wouldn't help. As long as wrapper (and every other general-purpose wrapper every written) doesn't do anything with __kwargs_order__, the order is not going to get passed to wrapped. The obvious way to fix this is to make a **kwargs parameter pack the keyword arguments into an OrderedDict, but Guido has already rejected that, which is why Eric Snow had to come up with his two other options. But they don't solve the problem.

On Mar 20, 2014, at 5:58, MRAB <python@mrabarnett.plus.com> wrote:
This is basically an optimization, where the unpacking happens inside CALL_FUNCTION_KW instead of in a bunch of separate bytecodes that pushed them onto the stack to be popped off by CALL_FUNCTION. The end result is identical. Inside CALL_FUNCTION_KW, instead of starting with an empty dict and then popping keywords into it, it starts with a copy of kwargs and then pops keywords into it. If that weren't true, you wouldn't be able to do this: def foo(a, **kwargs): pass foo(**{'a': 1, 'b': 2}) Or: def foo(**kwargs): pass foo(b=2, **{'a': 1}) (If you replace a or b with self, this should look more familiar.) And, even if you don't have any missing or extra arguments, you are still going to get a new dict inside the callee. Otherwise this would do the wrong thing: def foo(**kwargs): kwargs['a'] = 3 d = {'a': 1, 'b': 2} foo(**d) And, even if you changed Python so that, in the case where there are no extra or missing keywords and the dict isn't mutated inside foo, you just passed it as-is and skipped the copy, that _still_ wouldn't solve the forwarding problem, because it's still just a dict in arbitrary order. And adding a separate hidden parameter to pass the order along still wouldn't help unless you also forwarded _that_ parameter somehow. And as far as I can see, there is no way to solve this problem. The only way to add keyword order information without breaking all existing forwarding functions is to make the **kwargs parameter always preserve order (and make the **kwargs calling syntax preserve order if present)--that is, to make it always an OrderedDict, which Guido has already rejected.

On Thu, Mar 20, 2014 at 10:28 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
And as far as I can see, there is no way to solve this problem. The only way to add keyword order information without breaking all existing forwarding functions is to make the **kwargs parameter always preserve order (and make the **kwargs calling syntax preserve order if present)--that is, to make it always an OrderedDict, which Guido has already rejected.
It's not all that bleak. Either of the options I outlined should still work, though option #1 (using a decorator) is better for the pass-through case that has come up (which I'm not convinced is such a big deal). I should have a rough PEP up in a day or two. Then I'll get my C OrderedDict applying cleanly and put up an implementation for the PEP, neither of which *should* take a lot of work. For what it's worth, that C implementation isn't all that fancy nor is it very optimized, but it still performs pretty well (last time I checked it was mostly the same as dict with the worst operations taking 4 times longer, which is much better than the pure Python version). This discussion has gotten me thinking about a related point (loosening the restriction on ** unpacking), but I'll start another thread on that one. -eric

MRAB wrote:
It looks like the dict 'kwargs' is being passed straight to 'wrappee'.
But the interpreter doesn't just pass the object given as a ** argument straight to the function's ** parameter; it gets repacked into a new dict:
So even if you pass an OrderedDict or some other custom object as a ** argument, it's a plain dict by the time it comes out the other end. -- Greg

On Thu, Mar 20, 2014, at 2:43, Andrew Barnert wrote:
What about putting a constraint on the dict class that it shall enumerate its keys in the order they were added as long as no key has ever been removed? C#'s Dictionary does this [though it doesn't have a guarantee of doing so].

On 04/03/2014 09:55 AM, random832@fastmail.us wrote:
Here's the latest about it in 3.4: https://docs.python.org/3/whatsnew/3.4.html#pep-456-secure-and-interchangeab... Perhaps always iterating in insertion order would actually make it harder to discover the hash algorithm used? -- ~Ethan~

On Thu, Apr 3, 2014, at 12:58, Ethan Furman wrote:
Iterating in insertion order would make the iteration completely independent of the hashing algorithm. The hashing algorithm determines what buckets each item goes in, but you don't _have_ to iterate in bucket order, depending on your data structure. C#, for example, stores items in an array [which are populated from a freelist of array slots that don't have items in them, or sequentially at the end of the array if there are no slots in the freelist] and the buckets as linked lists within the array. The security purpose of that change is to prevent an attacker from making a bunch of items all go in the same bucket, ruining the performance characteristics of the hash table.

On Fri, Apr 4, 2014 at 6:14 AM, <random832@fastmail.us> wrote:
However that's done, it implies additional storage beyond the straight-forward mapping, right? Python's dict implementation is *highly* optimized (which is necessary, given how much of Python is built on dicts), and any proposal that basically says "Every dict needs to keep track of insertion order, for the sake of the handful of cases where you actually care" is likely to have the aerodynamic qualities of a rock. ChrisA

On Thu, Apr 3, 2014 at 4:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
Raymond has a proposal (back-burnered?) for a compact dict implementation that preserves order in iteration until the first deletion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html https://mail.python.org/pipermail/python-dev/2013-May/126327.html -eric

On Thu, Mar 20, 2014 at 10:32 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Strongly against any proposal that makes argument passing behave differently based on the target - it means the process of building up the arguments has to check some attribute on the function. Order of kwargs is a problem for any function that passes args through to another function. def wrap_spam(*args, **kwargs): try: return spam(*args, **kwargs) except EggsError: return False Does wrap_spam have to be aware that spam cares about keyword argument order? Should it preserve order "just in case"? Should the __kwargs_order__ list be automatically propagated? And what happens if the args aren't passed through perfectly, but some are added/removed? ChrisA

On Mar 19, 2014, at 16:32, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hopefully I'll have time to write a proto-PEP on this in the next couple weeks, but the gist is that I see 2 options:
Last time around, someone suggested that ***kwargs could get you the args as a list of pairs, or maybe an OrderedDict. While that would require a bit of complexity in the CALL_FUNCTION code, it seems like it would be simpler than your option 2. But it doesn't solve the main problem. Right now, you can forward any arguments perfectly by doing this: def wrapper(*args, **kwargs): return wrappee(*args, **kwargs) Your option 2 would require much more verbose code to forward perfectly. The triple-star idea makes it a lot nicer (assuming that ** passing respects the order, or a new *** is added there as well), but it would still break the thousands of wrapper functions out there written with **kwargs.

On 2014-03-20 01:58, Andrew Barnert wrote:> On Mar 19, 2014, at 16:32, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Wouldn't it be a problem only if the dict were unpacked and then repacked? In the code above, it's merely passing the input kwargs on to 'wrappee'.

Somehow it feels wrong to add a fairly complex new feature to the language (namely, ordered keyword arguments) to handle the very specific case of ordered dict constructors (or is there some other (clearly) different use intended?). If anything, I prefer the abuse of "slicing" syntax (so that one can write "od['a': 'b', 'c': 'd']" which is not much worse than "{'a': 'b', 'c': 'd'}"). Let's say we start with def wrapper(**kwargs): <some code that manipulates kwargs> return g(**kwargs) def g(**kwargs): pass wrapper(a=1, b=2) Nothing requires ordered keyword arguments, wrapper doesn't bother saving the argument order, everything is fine (but because there is some code that manipulates kwargs, an actual dict must be created, not just some sort of proxy)... now, just after wrapper is executed and before g gets called, globals()["g"] gets changed (by another thread...) to a function that does require ordered keyword arguments. What now? Antony 2014-03-19 19:42 GMT-07:00 MRAB <python@mrabarnett.plus.com>:

From: MRAB <python@mrabarnett.plus.com> Sent: Wednesday, March 19, 2014 7:42 PM
No, a **kwargs parameter packs the keyword arguments into a dict, and a **kwargs argument unpacks a dict into the keyword arguments; that's exactly what they do. And dicts have arbitrary order. Besides, the whole point of option 2 is that kwargs is still just a plain dict, and the order is passed via a magic hidden parameter __kwargs_order__. So, even if you _were_ just passing kwargs on without doing anything to it, that still wouldn't help. As long as wrapper (and every other general-purpose wrapper every written) doesn't do anything with __kwargs_order__, the order is not going to get passed to wrapped. The obvious way to fix this is to make a **kwargs parameter pack the keyword arguments into an OrderedDict, but Guido has already rejected that, which is why Eric Snow had to come up with his two other options. But they don't solve the problem.

On Mar 20, 2014, at 5:58, MRAB <python@mrabarnett.plus.com> wrote:
This is basically an optimization, where the unpacking happens inside CALL_FUNCTION_KW instead of in a bunch of separate bytecodes that pushed them onto the stack to be popped off by CALL_FUNCTION. The end result is identical. Inside CALL_FUNCTION_KW, instead of starting with an empty dict and then popping keywords into it, it starts with a copy of kwargs and then pops keywords into it. If that weren't true, you wouldn't be able to do this: def foo(a, **kwargs): pass foo(**{'a': 1, 'b': 2}) Or: def foo(**kwargs): pass foo(b=2, **{'a': 1}) (If you replace a or b with self, this should look more familiar.) And, even if you don't have any missing or extra arguments, you are still going to get a new dict inside the callee. Otherwise this would do the wrong thing: def foo(**kwargs): kwargs['a'] = 3 d = {'a': 1, 'b': 2} foo(**d) And, even if you changed Python so that, in the case where there are no extra or missing keywords and the dict isn't mutated inside foo, you just passed it as-is and skipped the copy, that _still_ wouldn't solve the forwarding problem, because it's still just a dict in arbitrary order. And adding a separate hidden parameter to pass the order along still wouldn't help unless you also forwarded _that_ parameter somehow. And as far as I can see, there is no way to solve this problem. The only way to add keyword order information without breaking all existing forwarding functions is to make the **kwargs parameter always preserve order (and make the **kwargs calling syntax preserve order if present)--that is, to make it always an OrderedDict, which Guido has already rejected.

On Thu, Mar 20, 2014 at 10:28 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
And as far as I can see, there is no way to solve this problem. The only way to add keyword order information without breaking all existing forwarding functions is to make the **kwargs parameter always preserve order (and make the **kwargs calling syntax preserve order if present)--that is, to make it always an OrderedDict, which Guido has already rejected.
It's not all that bleak. Either of the options I outlined should still work, though option #1 (using a decorator) is better for the pass-through case that has come up (which I'm not convinced is such a big deal). I should have a rough PEP up in a day or two. Then I'll get my C OrderedDict applying cleanly and put up an implementation for the PEP, neither of which *should* take a lot of work. For what it's worth, that C implementation isn't all that fancy nor is it very optimized, but it still performs pretty well (last time I checked it was mostly the same as dict with the worst operations taking 4 times longer, which is much better than the pure Python version). This discussion has gotten me thinking about a related point (loosening the restriction on ** unpacking), but I'll start another thread on that one. -eric

MRAB wrote:
It looks like the dict 'kwargs' is being passed straight to 'wrappee'.
But the interpreter doesn't just pass the object given as a ** argument straight to the function's ** parameter; it gets repacked into a new dict:
So even if you pass an OrderedDict or some other custom object as a ** argument, it's a plain dict by the time it comes out the other end. -- Greg

On Thu, Mar 20, 2014, at 2:43, Andrew Barnert wrote:
What about putting a constraint on the dict class that it shall enumerate its keys in the order they were added as long as no key has ever been removed? C#'s Dictionary does this [though it doesn't have a guarantee of doing so].

On 04/03/2014 09:55 AM, random832@fastmail.us wrote:
Here's the latest about it in 3.4: https://docs.python.org/3/whatsnew/3.4.html#pep-456-secure-and-interchangeab... Perhaps always iterating in insertion order would actually make it harder to discover the hash algorithm used? -- ~Ethan~

On Thu, Apr 3, 2014, at 12:58, Ethan Furman wrote:
Iterating in insertion order would make the iteration completely independent of the hashing algorithm. The hashing algorithm determines what buckets each item goes in, but you don't _have_ to iterate in bucket order, depending on your data structure. C#, for example, stores items in an array [which are populated from a freelist of array slots that don't have items in them, or sequentially at the end of the array if there are no slots in the freelist] and the buckets as linked lists within the array. The security purpose of that change is to prevent an attacker from making a bunch of items all go in the same bucket, ruining the performance characteristics of the hash table.

On Fri, Apr 4, 2014 at 6:14 AM, <random832@fastmail.us> wrote:
However that's done, it implies additional storage beyond the straight-forward mapping, right? Python's dict implementation is *highly* optimized (which is necessary, given how much of Python is built on dicts), and any proposal that basically says "Every dict needs to keep track of insertion order, for the sake of the handful of cases where you actually care" is likely to have the aerodynamic qualities of a rock. ChrisA

On Thu, Apr 3, 2014 at 4:24 PM, Chris Angelico <rosuav@gmail.com> wrote:
Raymond has a proposal (back-burnered?) for a compact dict implementation that preserves order in iteration until the first deletion: https://mail.python.org/pipermail/python-dev/2012-December/123028.html https://mail.python.org/pipermail/python-dev/2013-May/126327.html -eric
participants (8)
-
Andrew Barnert
-
Antony Lee
-
Chris Angelico
-
Eric Snow
-
Ethan Furman
-
Greg Ewing
-
MRAB
-
random832@fastmail.us