[Python-Dev] Toowtdi: Datatype conversions

Raymond Hettinger python at rcn.com
Sat Jan 3 20:50:13 EST 2004


[Raymond Hettinger]
> > Which is the one obvious way of turning a dictionary into a list?

[Martin v. Loewis]
> There is no obvious way to turn a dictionary into a list; lists and 
> dictionaries are completely different things.

[Guido]
> Ugh.  I really hope we aren't going to teach people to write list(d)
> instead of d.keys().

Put another way, which is preferable:

   list(d.iterkeys())    vs.   d.keys()
   list(d.itervalues())  vs.   d.values()
   list(d.iteritems())   vs.   d.items()


[Martin] 
> Dictionaries are similar to sets; in fact, Smalltalk as an Association

> class (Assocation key: value:), and Dictionary is a set of 
> associations.

Your perceptiveness is uncanny!  This issue arose for me while 
writing an extension module that implements Smalltalk bags which 
*do* have meaningful conversions to sets, lists, and dicts:

>>> list(b)
['dog', 'dog', 'cat']
>>> dict(b.iter_with_counts)
{'dog':2, 'cat':1}
>>> set(b)
set(['dog', 'cat'])
>>> list(b.iter_unique)
['dog', 'cat']

The alternative API is:

bag.asList()
bag.asDict()
bag.asSet()
bag.unique()

I'm trying to decide which API is the cleanest and has reasonable
performance.

The former has two fewer methods.  The latter has much better
performance but won't support casts to subclasses of list/set/dict.



[Raymond]
> > So, one question is whether set() and frozenset() should grow an 
> > analogue to the keys() method:

[Robert Brewer]
> I don't think so, for the reason that .keys() is effectively
> a disambiguator as I just described. With sets, there is no mapping,
> and therefore no ambiguity.

[Martin]
> But this is a completely different issue! For sets, there is an 
> obvious way (if you accept that the list will have the same elements 
> in arbitrary order), then
> 
>    list(a_set)
> 
> is the most obvious way, and it should work fastest.

Right!  Unfortunately, it can never be as fast as a set.elements() 
method -- the underlying d.keys() method has too many advantages 
(looping with an in-lined version of PyDict_Next(), knowing the 
size of the dictionary, writing with PyList_SET_ITEM, and having all
steps in-lines with no intervening function calls).



> > Another bright idea is to support faster datatype conversion by 
> > adding an optional __len__() method to the iteration protocol so 
> > that list(), tuple(), dict(), and set() could allocate sufficient 
> > space for loading any iterable that knows its own length.

[Martin]
> That is useful, also for list comprehension.
> 
> > The advantages are faster type conversion (by avoiding resizing), 
> > keeping the APIs decoupled, and keeping the visible API thin. This 
> > disadvantage is that it clutters the C code with special case 
> > handling and that it doesn't work with generators or custom 
> > iterators (unless they add support for __len__).
> 
> I see no reason why it should not work for custom iterators. For 
> generators, you typically don't know how many results you will get in 
> the end, so it is no loss that you cannot specify that.

That makes sense.

Looking at the code for list_fill, I see that some length checking is
already done but only if the underlying object fills sq_length. I think
that check should be replaced by a call to PyObject_Size().

That leaves a question as to how to best empower the dictionary
constructor.  If the source has an underlying dictionary (a Bag is a
good example), then nothing beats PyDict_Copy().  For sets, that only
works if you accept the default value of True.

The set constructor has the same issue when the length of the 
iterable is knowable.  The problem is that there is no analogue 
to PyList(n) which returns a presized collection.

On a separate issue, does anyone care that dict.__init__() has 
update behavior instead of replace behavior like list.__init__()?



Raymond




More information about the Python-Dev mailing list