[Python-ideas] Preserving **kwargs order (was: Re: OrderedDict literals)

Andrew Barnert abarnert at yahoo.com
Wed Apr 16 05:59:37 CEST 2014


On Apr 15, 2014, at 14:32, "Franklin? Lee" <leewangzhong+python at gmail.com> wrote:

>> but obviously wouldn't work if you tried to create the same type for kwargs.
> 
> Sorry, I said that completely wrong. I mean that if it's already a special dict (one that can be optimized, for some definition of "can"), the interpreter can use a special algorithm for copying from such a dict. Otherwise, it'd just have to do regular iteration.
> 
>> More importantly, I think you're still missing the point that keyword unpacking and extra-keyword parameters are not directly connected.
> 
> Rather than missing the point, I don't see any reason why they shouldn't be made connected.

Because they're not conceptually connected. Just because people often get confused and think they're a matched pair doesn't mean the language should change to confuse people even further.

>> For example, how is this going to preserve keyword order in cases where nobody unpacks a mapping along with the keywords?
>  
> (and)
>  
>> Again, kwargs parameters aren't forwarded from anything, they're built from the unmatched keyword arguments, both normal and unpacked.
> 
> An algorithm-ish: At callpoint, the caller would put explicit keywords into a new kwdict, then look for **somethingToUnpack. If it finds it, and it's a special dict, it packs "smartly". Otherwise, it iterates through the pack and adds it to the kwdict.

By 'packs "smartly"' you mean it copies the dict it's already started building into a new special dict so that it can then update that new special dict with the unpacking special dict?

I suspect the cost of building a normal dict just to copy it to a special dict is going to cost a lot more than whatever the savings for your special dict.

Anyway, you really should look at how CPython and other implementations do it before proposing changes. In CPython, the calling side puts keyword args on the stack, and also puts the unpacking dict on the stack, and then passes a count of keyword args and a flag indicating the presence of that dict to the callee. The callee then pops those, builds the new dict (which includes looking up each keyword in the dict to make sure it isn't a duplicate), fills in the param values, and fills the kwargs dict with what's left over. What you're proposing isn't a modification of that, but a completely different design. Which might be worth doing, but you either need to argue that it's better, or work with the existing design.

> The callee gets the kwdict and pops the entries that it makes explicit (allowing deletion isn't the biggest problem). The leftover kwdict is named kwargs or whatever.
> 
>> How would this work?
> 
> "it can pass a *view* of the kwargs dictionary *if* the callee would receive the exact same keys"
> 
> Your counter is that it wouldn't work if the callee doesn't receive the exact same keys. In that case, it would do the more general thing I stated above, whereby it creates a new dict like it does currently, but in a "smarter" way. It's a separate idea.
> 
>> then all this does is trick Guido's code and prevent it from even having a way to test or document its preconditions.
> 
> I didn't understand this. What preconditions?

The precondition that it be something that acts like an actual dict (not just a MutableMapping), which you spell isinstance(it, dict). Taking something that isn't a dict an making it a subclass of dict doesn't fix the problem, it just means you can no longer test for the problem even if you want to.

>> And if it actually is a subtype, I don't see how you're going to accelerate it; it it can assign to keys of any type, then it can't assume all its keys are strings.
> 
> What I suggested was a second array of k-v pairs for newly-added items, still in order, but with more general keys. If done with ALL new keys, this might slow down lookup due to looking up the ID (in case it's interned) and then hashing and looking up the string itself. But maybe this kind of case (modifying incoming kwargs) is uncommon enough (as in, doesn't happen much relative to the general case) that the overall program is faster. I don't know. Again, I'm putting forth the idea for someone who knows better about the internals to say for sure.
> 
>> Also, even if this worked, you're putting the burden of ordering the keywords on the caller, rather than on the function that actually cares about the keyword order. That's the wrong place to put it. You've solved  the case of forwarding delegate methods, partial, etc., but not the everyday case.
> 
> 
> The everyday case only cares if the actual performance goes down. I don't see a philosophical reason to ban the possibility of caller-burden altogether. That's why I'm asking the question: if the ** mechanic is clever by using the restraints about what is legal to pack/unpack, would it improve performance to the point that order can be thrown in and overall performance is STILL better than the status quo?
> 
>> A more complicated alternative proposal that's also probably fast enough and small enough except in those edge cases, but breaks those edge cases and others as well in a backward-incompatible way, doesn't seem like a win.
> 
> It may be confusing, but I'm proposing multiple exclusive possibilities.
> - Break those edge cases, because I'm not convinced they should be allowed.
> - Don't break those edge cases, but it's okay to slow them down because everything else will be faster and make up for them.
> 
> Breaking the edge cases isn't integral, but it might be desirable for the language, far off into the future.
> 
> And I'm not so much fighting for this to happen (instead of the C OrderedDict) as trying to flesh out the idea, because I don't *know enough* to say that it *won't* be better.
> 
>> The post you're referring to doesn't make lookup any faster; it makes most dicts smaller, and it makes iteration faster, but it has no effect on lookup. Also, notice that it explicitly says it has no effect on the  existing optimization for lookup in string-only dicts. So, if you're hoping that it could somehow make string-only dicts faster, you need to read it again.
> 
> No. That line you quoted was about why my proposal WOULDN'T help. What I said was that I thought it could be better than the current use of dict, and it's possible that it only has that potential because dict isn't as good as it can be. In other words, maybe it CAN be better than the current implementation with dict, but instead of putting in the effort to make this, put the effort into making dict better.
> 
> Note, though, that
> 1. Raymond Hettinger's dict changes WOULD make it ordered, giving us ordered kwargs as a byproduct.
> 2. It makes iteration faster, which, as far as I know, should make kwargs packing and unpacking faster.
> 
> But rereading it, I see that it means dicts already have a str-only optimization. Rather than, "This dict can't be used for your suggested kwargs optimization," it's, "The current dict is probably already doing what you're suggesting, so it wouldn't be much of a performance improvement."
> 
>> But you have to hash the keywords in the first place, or do something equivalent, to intern them.
> 
> Yes. It'd be hashed once per keyword per function, and once per explicit keyword in a call (so if a line calls multiple times, it would have to hash that, I suppose, until the interpreter gets smarter, or if it's already smart about it), and once per keyword in an expansion from a regular dict.

>> Also, you have to keep in mind that, on top of any lookups the function body does, the function call machinery also has to effectively look up every parameter and (to catch duplicate-keyword errors) every keyword argument in the unpacked mapping. If those lookups weren't fast enough, then your change doesn't help. If they were, then your change is probably unnecessary.
> 
> I don't see the relevance of needing to look up every parameter. What is the objection?

The fact that each keyword involves multiple lookups and only one iteration, and you're only speeding up iteration, means you're probably optimizing the part that doesn't matter.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140415/4d5ca0e0/attachment-0001.html>


More information about the Python-ideas mailing list