[Python-Dev] Toowtdi: Datatype conversions

Raymond Hettinger python at rcn.com
Sat Jan 3 20:37:22 EST 2004


[Raymond Hettinger]
> > Which is the one obvious way of turning a dictionary into a list?

[Martin v. Loewis]
> There is no obvious way to turn a dictionary into a list;
> lists and dictionaries are completely different things.

[Guido]
> Ugh.  I really hope we aren't going to teach people to write list(d) 
> instead of d.keys().

Put another way, which is preferable:

   list(d.iterkeys())    vs.   d.keys()
   list(d.itervalues())  vs.   d.values()
   list(d.iteritems())   vs.   d.items()


[Martin] 
> Dictionaries are similar to sets; in fact, Smalltalk as an
> Association class (Assocation key: value:), and Dictionary
> is a set of associations.

Your perceptiveness is uncanny!  This issue arose for me while 
writing an extension module that implements Smalltalk bags which 
*do* have meaningful conversions to sets, lists, and dicts:

>>> list(b)
['dog', 'dog', 'cat']
>>> dict(b.iter_with_counts)
{'dog':2, 'cat':1}
>>> set(b)
set(['dog', 'cat'])
>>> list(b.iter_unique)
['dog', 'cat']

The alternative API is:

bag.asList()
bag.asDict()
bag.asSet()
bag.unique()

I'm trying to decide which API is the cleanest and has reasonable
performance.

The former has two fewer methods.  The latter has much better
performance but won't support casts to subclasses of list/set/dict.



> > So, one question is whether set() and frozenset() should grow an
> > analogue to the keys() method:

[Robert Brewer]
> I don't think so, for the reason that .keys() is effectively 
> a disambiguator as I just described. With sets, there is no mapping,
> and therefore no ambiguity.

[Martin]
> But this is a completely different issue! For sets, there is an
> obvious way (if you accept that the list will have the same elements
> in arbitrary order), then
> 
>    list(a_set)
> 
> is the most obvious way, and it should work fastest.

Right!  Unfortunately, it can never be as fast as a set.elements() 
method -- the underlying d.keys() method has too many advantages 
(looping with an in-lined version of PyDict_Next(), knowing the 
size of the dictionary, writing with PyList_SET_ITEM, and having
all steps in-lines with no intervening function calls).



> > Another bright idea is to support faster datatype conversion
> > by adding an optional __len__() method to the iteration
> > protocol so that list(), tuple(), dict(), and set() could
> > allocate sufficient space for loading any iterable that
> > knows its own length.

[Martin]
> That is useful, also for list comprehension.
> 
> > The advantages are faster type conversion (by avoiding resizing),
> > keeping the APIs decoupled, and keeping the visible API thin.
> > This disadvantage is that it clutters the C code with special
> > case handling and that it doesn't work with generators or
> > custom iterators (unless they add support for __len__).
> 
> I see no reason why it should not work for custom iterators.
> For generators, you typically don't know how many results you
> will get in the end, so it is no loss that you cannot specify
> that.

That makes sense.

Looking at the code for list_fill, I see that some length checking
is already done but only if the underlying object fills sq_length.
I think that check should be replaced by a call to PyObject_Size().

That leaves a question as to how to best empower the dictionary
constructor.  If the source has an underlying dictionary (a Bag
is a good example), then nothing beats PyDict_Copy().  For sets,
that only works if you accept the default value of True.

The set constructor has the same issue when the length of the 
iterable is knowable.  The problem is that there is no analogue 
to PyList(n) which returns a presized collection.

On a separate issue, does anyone care that dict.__init__() has 
update behavior instead of replace behavior like list.__init__()?



Raymond




More information about the Python-Dev mailing list