[Patches] [ python-Patches-1615701 ] Creating dicts for dict subclasses

Wed Feb 7 21:17:23 CET 2007

Patches item #1615701, was opened at 2006-12-14 08:08
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Raymond Hettinger (rhettinger)
Summary: Creating dicts for dict subclasses

Initial Comment:
This patch changes dictobject.c so that creating dicts from mapping like objects only uses the internal dict functions if the argument is a *real* dict, not a subclass. This means that overwritten keys() and __getitem__() methods are now honored. In addition to that the fallback implementation now tries iterkeys() before trying keys(). It also adds a PyMapping_IterKeys() macro.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2007-02-07 15:17

Message:
Logged In: YES 
user_id=80475
Originator: NO

Added PyDict_CheckExact() in revisions 53655 and 53656.  A side-effect of
this change is to slow-down updates with dict subclasses that are not
overriding keys() and __getitem__().  This is especially unfortunate given
good existing alternatives and given a lack of real use cases (dict
subclasses that aspire to hand-off updates but not use their actual keys
and mapped values).

Left out the gratuitous expansion of the API which exposed too much of the
internal implementation and sought to introduce more implicit behaviors
that would better be handled by explictly passing in an iterable of items
to d.update().  For example.  d.update((k(x), g(x)) for x in
myweirdmapping).

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-12-20 07:59

Message:
Logged In: YES 
user_id=89016
Originator: YES

To clear up some apparent misunderstandings: This patch does *not*
advocate that some dict methods should be implemented by calling other dict
methods so that dict subclasses only have to overwrite a few methods to
gain a completely consistent implementation.

This patch only fixes the dict constructor (and update()) and consists of
two parts:

(1) There are three code paths in dictobject.c::dict_update_common(): (a)
if the constructor argument doesn't have a "keys" attribute treat it as a
iterable of items; (b) if the argument has a "keys" attribute, but is not a
dict (and not an instance of a subclass of dict), use keys() and
__getitem__() to make a copy of the mapping-like object. (c) if the
argument has a "keys" attribute and is a dict (or an instance of a subclass
of dict) bypass any of the overwritten methods that the object might
provide and directly use the dict implementation. This patch changes
PyDict_Merge() so that code path (b) is used for dict constructor arguments
that are subclasses of dict, so that any overwritten methods are honored.

(2) This means that now if a subclass of dict is passed to the constructor
or update() the code is IMHO more correct (it honors the reimplemenation of
the mapping methods), but slower. To reduce the slowdown instead of using
kesY() and __getitem__(), iterkeys() and __getitem__() are used.

I can't see why the current behaviour should be better: Yes, it is faster,
but it is also wrong: Without the patch the behaviour of dict() and
dict.update() depends on the fact whether the argument happens to subclass
dict or not. If it doesn't all is well: the argument is treated as a
mapping (i.e. keys() and __getitem__() are used) otherwise the methods are
completely ignored.

So can we agree on the fact that (1) is desirable? (At least Guido said as
much:
http://mail.python.org/pipermail/python-dev/2006-December/070341.html)

BTW, I only added PyMapping_Iterkeys() because it mirrors
PyMapping_Keys().

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 19:13

Message:
Logged In: YES 
user_id=80475
Originator: NO

Since update already supports (key, item) changes, I do not see that
rationale in trying to expand the definition what is dict-like to include a
try-this, then try-that approach.  This is a little too ad-hoc for too
little benefit.

Also, I do not see the point of adding PyMapping_Iterkeys to the C API. 
It affords no advantage over its macro definition (the current
one-way-to-do-it). 

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 18:00

Message:
Logged In: YES 
user_id=80475
Originator: NO

It is also asking for bugs if someone hooks __getitem__ and starts to make
possibly invalid assumptions about what other changes occur implicitly.

Also, this patch kills the performance of builtin subclasses.  If I
subclass dict to add a new method, it would suck to have the performance of
all of the other methods drop percariously.

This patch is somewhat overzealous.  It encroaches on the terriority of
UserDict.DictMixin which was specifically made for propagating new
behaviors.  It unnecessarily exposes implementation details.  It introduces
implicit behaviors that should be done through explicit overrides of
methods whose behavior is supposed to change.  

And, it is on the top of a slippery slope that we don't want to go down
(i.e. do we want to guarantee that list.append is implemented in terms of
list.extend, etc).  Python has no shortage of places where builtin
subclasses make direct calls to the underlying C code -- this patch leads
to a bottomless pit of changes that kill performance and make implicit
side-effects the norm instead of the exception.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

FWIW, I'm not sure I agree on not specifying which methods call share
implementation.

If someone hooks __getitem__ but not get, that is just asking for bugs. 
(The implementation of get may -- but need not -- make its own call to
__getitem__, and not everyone will make the same decision.)

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 17:26

Message:
Logged In: YES 
user_id=764593
Originator: NO

As I understand it, the problem is that dict.update is assuming any dict
subclass will use the same internal data representation.

Restricting the fast path to exactly builtin dicts (not subclasses) fixes
the bug, but makes the fallback more frequent.

The existing fallback is to call keys(), then iterate over it, retrieving
the value for each key.  (keys is required for a "minimal mapping" as
documented is UserDict, and a few other random places.)

The only potential dependency on other methods is his proposed new
intermediate path that avoids creating a list of all keys, by using
iterkeys if it exists.  (I suggested using iteritems to avoid the lookups.)
 If iter* aren't implemented, the only harm is falling back to the existing
fallback of "for k in keys():"

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 16:07

Message:
Logged In: YES 
user_id=80475
Originator: NO

I'm -1 on making ANY guarantees about which methods underlie others --
that would constitute new and everlasting guarantees about how mappings are
implemented.  Subclasses should explicity override/extend the methods
withed changed behavior.  If that proves non-trivial, then it is likely
there should be a has-a relationship instead of an is-a relationship. 
Also, it is likely that the subclass will have Liskov substitutability
violations.  Either way, there is probably a design flaw.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-12-19 14:23

Message:
Logged In: YES 
user_id=89016
Originator: YES

iteritems() has to create a new tuple for each item, so this might be
slower.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 12:50

Message:
Logged In: YES 
user_id=764593
Originator: NO

Why are you using iterkeys instead of iteritems?

It seems like if they've filled out the interface enough to have iterkeys,
they've probably filled it out all the way, and you do need the value as
soon as you get the key.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470