Dictionary from list?

Mon Oct 29 02:01:28 EST 2001

[Greg Chapman]
> On the subject of slippery mappings and the dictionary constructor,
> consider a subclass of dictionary which overrides __getitem__ to
> transform the value stored in the (inherited) dictionary structure to
> the "real" value stored in the (logical) mapping defined by the
> dictionary subclass.  (For example: a dictionary subclass which supports
> a defined order of iteration by using nodes in a linked list to store
> the keys and values, and then storing the keys and nodes in the
> dictionary structure).  In the 2.2b1, if an instance of such a
> subclass is passed to the constructor (or to the update method)
> of a normal dictionary, the overridden __getitem__ is ignored because
> of the PyDict_Check near the top of PyDict_Merge.

Yes.  Note that this isn't unique to dicts -- if, for example, you subclass
list, list(instance_of_that_list_subclass) doesn't look up list.__getitem__
either.  They all work this way.

> I'd like to suggest that that check be changed to PyDict_CheckExact
> (which apparently does not exist yet, but would  be analogous
> to PyString_CheckExact).  This would shunt dictionary subclasses into the
> generic mapping branch of PyDict_Merge, which would allow an overridden
> __getitem__ to work.

I agree that it would.  Whether it *should* is something you'll have to take
up with Guido.  Subclassing builtins is a tricky business.

Note that you *can* fiddle your subclass to change what's stored, by
overriding __init__.  A list example is clearer than a dict one simply
because less involved:

class L(list):
    def __init__(self, arg):
        list.__init__(self, arg)
        for i in range(len(arg)):
            self[i] = arg[i] * 100

    def __getitem__(self, i):
        return 666

x = L(range(3))
print x, list(x)

That prints [0, 100, 200] twice.  The second one isn't what you want today
(or so I predict), but it's clear as mud (to me) what most people will want
most often.  Is list() *defined* in terms of __getitem__?  Not really, not
even under the covers (it's defined more in terms of the iterator protocol
now).  What is dictionary() defined in terms of?  It simply isn't spelled
out yet.  With the dictionary() in current CVS, dictionary(subclass) won't
pay attention to a __getitem__ override, and dictionary(subclass.items())
won't either but will (trivially) pay attention to an .items() override.

Python usually resolves questions of this nature by picking the answer
that's easier to explain.  Since subclassing builtins in Python can't
override the builtin representation, I bet Guido will say current 2.2b1
behavior is easier to explain.  It's sure debatable, though.

> Alternatively, the PyDict_Check could be supplemented by a check to see
> if  __getitem__ has been overridden (and if so, using the generic
> code).  However, this would not help a subclass which transforms its
> keys in some way (so that PyMapping_Keys returns a different set of keys
> than that stored in the dictionary structure).  (I suppose the check
> could be extended to look for an overridden keys method.)

That's the problem, isn't it?  You can't guess what's going to happen
without studying the implementation code.  Since subclassing builtins is
brand new, Python takes "shortcuts" *all over the place* internally (the
marvel to me isn't that you discovered this about dict.update(), but that
you didn't stumble over 100 others before it <wink>).

They weren't shortcuts before 2.2 -- they were just the obvious ways to
implement things.  Now they look like shortcuts, "avoiding" thousands of
lines of fiddly new code to worry about "oops -- maybe this isn't a 'real'
str, list, tuple, dict, file, complex, int, long, float, let's do a
long-winded dance to see whether it's a subclass".

I expect it will take years to resolve all that.  In the meantime, exactly
when the assorted magic methods get called for instances of subclasses of
builtins is going to be unclear and sometimes surprising.  I also expect
changes to be driven by compelling use cases (for example, people have been
very vocal over the years about wanting to pass a dictionary substitute to
eval(), so it's no accident that case works for dict subclasses in 2.2b1).
And I also expect that some arguably good changes will never get made (e.g.,
because of the central role dicts play throughout Python's implementation,
anything catering to dict subclasses that slows "real dicts"-- even a
little --is going to be a difficult sell).

good-thing-python-never-gave-a-rip-about-theoretical-purity<wink>-ly
    y'rs  - tim