Toowtdi: Datatype conversions
data:image/s3,"s3://crabby-images/0e44e/0e44e71002b121953844f91d449442aafa9cfd14" alt=""
Choosing between: list(d) or d.keys() Which is the one obvious way of turning a dictionary into a list? IMO, list(d) is it. Which is the fastest? timeit.Timer() says d.keys() is several times faster. So, one question is whether set() and frozenset() should grow an analogue to the keys() method:
set('banana').elements() ['a', 'b', 'n']
This speeds-up the uniquification use case but comes at the expense of fattening the API and adding a second way to do it. Another question is whether there should be a method for conversion to a dictionary. Given the absence of use cases, the answer is no, but assuming there were, what would be the right way to go? set.asdict() would be several times faster than dict.fromkeys(s). One bright idea is to make the constructors a little bit smarter so that list(d) would automagically invoke d.keys() whenever d is a dictionary. The problem with this idea is that list creation is separate from list initialization. list.__init__() starts with an existing list and replaces its contents with the initializer. The precludes returning a list built by d.keys(). The situation for dicts is similar although there is a curious difference, dict.__init__() updates rather than replaces the contents of the underlying dictionary. Another bright idea is to support faster datatype conversion by adding an optional __len__() method to the iteration protocol so that list(), tuple(), dict(), and set() could allocate sufficient space for loading any iterable that knows its own length. The advantages are faster type conversion (by avoiding resizing), keeping the APIs decoupled, and keeping the visible API thin. This disadvantage is that it clutters the C code with special case handling and that it doesn't work with generators or custom iterators (unless they add support for __len__). Raymond Hettinger
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Raymond Hettinger wrote:
Choosing between:
list(d) or d.keys()
Which is the one obvious way of turning a dictionary into a list? IMO, list(d) is it.
Neither. There is no obvious way to turn a dictionary into a list; lists and dictionaries are completely different things. Dictionaries are similar to sets; in fact, Smalltalk as an Association class (Assocation key: value:), and Dictionary is a set of associations. Then, assuming there is an obvious way to convert a set into a list, the most obvious way to convert a dictionary into a list is d.items()
So, one question is whether set() and frozenset() should grow an analogue to the keys() method:
But this is a completely different issue! For sets, there is an obvious way (if you accept that the list will have the same elements in arbitrary order), then list(a_set) is the most obvious way, and it should work fastest.
Another question is whether there should be a method for conversion to a dictionary. Given the absence of use cases, the answer is no, but assuming there were, what would be the right way to go?
There is no obvious way to convert a set into a dictionary, as you don't know what the values should be (refuse the temptation to guess). If there was a use case, that use case would indicate what the values should be, and, from the use case, it would be clear what the method name would be. It would not be "asdict".
One bright idea is to make the constructors a little bit smarter so that list(d) would automagically invoke d.keys() whenever d is a dictionary.
But who needs list(d)?
Another bright idea is to support faster datatype conversion by adding an optional __len__() method to the iteration protocol so that list(), tuple(), dict(), and set() could allocate sufficient space for loading any iterable that knows its own length.
That is useful, also for list comprehension.
The advantages are faster type conversion (by avoiding resizing), keeping the APIs decoupled, and keeping the visible API thin. This disadvantage is that it clutters the C code with special case handling and that it doesn't work with generators or custom iterators (unless they add support for __len__).
I see no reason why it should not work for custom iterators. For generators, you typically don't know how many results you will get in the end, so it is no loss that you cannot specify that. Regards, Martin
data:image/s3,"s3://crabby-images/d501e/d501ebac8695a6a0ff0a13f99601c648d910a813" alt=""
[Raymond Hettinger]
Which is the one obvious way of turning a dictionary into a list?
[Martin v. Loewis]
There is no obvious way to turn a dictionary into a list; lists and dictionaries are completely different things.
[Guido]
Ugh. I really hope we aren't going to teach people to write list(d) instead of d.keys().
Put another way, which is preferable: list(d.iterkeys()) vs. d.keys() list(d.itervalues()) vs. d.values() list(d.iteritems()) vs. d.items() [Martin]
Dictionaries are similar to sets; in fact, Smalltalk as an Association class (Assocation key: value:), and Dictionary is a set of associations.
Your perceptiveness is uncanny! This issue arose for me while writing an extension module that implements Smalltalk bags which *do* have meaningful conversions to sets, lists, and dicts:
list(b) ['dog', 'dog', 'cat'] dict(b.iter_with_counts) {'dog':2, 'cat':1} set(b) set(['dog', 'cat']) list(b.iter_unique) ['dog', 'cat']
The alternative API is: bag.asList() bag.asDict() bag.asSet() bag.unique() I'm trying to decide which API is the cleanest and has reasonable performance. The former has two fewer methods. The latter has much better performance but won't support casts to subclasses of list/set/dict.
So, one question is whether set() and frozenset() should grow an analogue to the keys() method:
[Robert Brewer]
I don't think so, for the reason that .keys() is effectively a disambiguator as I just described. With sets, there is no mapping, and therefore no ambiguity.
[Martin]
But this is a completely different issue! For sets, there is an obvious way (if you accept that the list will have the same elements in arbitrary order), then
list(a_set)
is the most obvious way, and it should work fastest.
Right! Unfortunately, it can never be as fast as a set.elements() method -- the underlying d.keys() method has too many advantages (looping with an in-lined version of PyDict_Next(), knowing the size of the dictionary, writing with PyList_SET_ITEM, and having all steps in-lines with no intervening function calls).
Another bright idea is to support faster datatype conversion by adding an optional __len__() method to the iteration protocol so that list(), tuple(), dict(), and set() could allocate sufficient space for loading any iterable that knows its own length.
[Martin]
That is useful, also for list comprehension.
The advantages are faster type conversion (by avoiding resizing), keeping the APIs decoupled, and keeping the visible API thin. This disadvantage is that it clutters the C code with special case handling and that it doesn't work with generators or custom iterators (unless they add support for __len__).
I see no reason why it should not work for custom iterators. For generators, you typically don't know how many results you will get in the end, so it is no loss that you cannot specify that.
That makes sense. Looking at the code for list_fill, I see that some length checking is already done but only if the underlying object fills sq_length. I think that check should be replaced by a call to PyObject_Size(). That leaves a question as to how to best empower the dictionary constructor. If the source has an underlying dictionary (a Bag is a good example), then nothing beats PyDict_Copy(). For sets, that only works if you accept the default value of True. The set constructor has the same issue when the length of the iterable is knowable. The problem is that there is no analogue to PyList(n) which returns a presized collection. On a separate issue, does anyone care that dict.__init__() has update behavior instead of replace behavior like list.__init__()? Raymond
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Raymond Hettinger wrote:
Put another way, which is preferable:
list(d.iterkeys()) vs. d.keys() list(d.itervalues()) vs. d.values() list(d.iteritems()) vs. d.items()
Clearly, the direct methods. Why do you ask?
The alternative API is:
bag.asList() bag.asDict() bag.asSet() bag.unique()
For bags, I would say that this API is most appropriate, from a pure Python point of view. Of course, as you are modelling Smalltalk, you should let yourself guide by the methods that Smalltalk provides (which is just asSet, asArray, and sortedByCount, AFAICT). It appears that Smalltalk has no way of converting the Bag to a Dictionary. I may be missing something here, since that is an obvious conversion - but then, also an unnecessary one.
The former has two fewer methods. The latter has much better performance but won't support casts to subclasses of list/set/dict.
Users in need of such conversions could always convert the result of some .as method.
list(a_set)
is the most obvious way, and it should work fastest.
Right! Unfortunately, it can never be as fast as a set.elements() method -- the underlying d.keys() method has too many advantages
Well, list construction could special-case sets.
That leaves a question as to how to best empower the dictionary constructor.
Why is that an interesting question, again (for the set case)? Regards, Martin
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Choosing between:
list(d) or d.keys()
Which is the one obvious way of turning a dictionary into a list? IMO, list(d) is it.
Ugh. I really hope we aren't going to teach people to write list(d) instead of d.keys(). The latter is totally clear. The former requires one to stop and remember that this uses the keys only. This is different for sets, where there's no ambiguity in what list(d) could possibly *mean* (except for the ordering, which is a second-order issue that doesn't affect the *type* of the result). While I *like* being able to write for key in d: ... instead of for key in d.keys(): ... I'm not so sure that having list(d) do anything at all was such a great idea. Not because of TOOWTDI, but because it doesn't tell the reader enough. And the polymorphism properties are really weird: if d could be either a mapping or a sequence, list(d) either loses information or it doesn't. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/d501e/d501ebac8695a6a0ff0a13f99601c648d910a813" alt=""
[Raymond Hettinger]
Which is the one obvious way of turning a dictionary into a list?
[Martin v. Loewis]
There is no obvious way to turn a dictionary into a list; lists and dictionaries are completely different things.
[Guido]
Ugh. I really hope we aren't going to teach people to write list(d) instead of d.keys().
Put another way, which is preferable: list(d.iterkeys()) vs. d.keys() list(d.itervalues()) vs. d.values() list(d.iteritems()) vs. d.items() [Martin]
Dictionaries are similar to sets; in fact, Smalltalk as an Association
class (Assocation key: value:), and Dictionary is a set of associations.
Your perceptiveness is uncanny! This issue arose for me while writing an extension module that implements Smalltalk bags which *do* have meaningful conversions to sets, lists, and dicts:
list(b) ['dog', 'dog', 'cat'] dict(b.iter_with_counts) {'dog':2, 'cat':1} set(b) set(['dog', 'cat']) list(b.iter_unique) ['dog', 'cat']
The alternative API is: bag.asList() bag.asDict() bag.asSet() bag.unique() I'm trying to decide which API is the cleanest and has reasonable performance. The former has two fewer methods. The latter has much better performance but won't support casts to subclasses of list/set/dict. [Raymond]
So, one question is whether set() and frozenset() should grow an analogue to the keys() method:
[Robert Brewer]
I don't think so, for the reason that .keys() is effectively a disambiguator as I just described. With sets, there is no mapping, and therefore no ambiguity.
[Martin]
But this is a completely different issue! For sets, there is an obvious way (if you accept that the list will have the same elements in arbitrary order), then
list(a_set)
is the most obvious way, and it should work fastest.
Right! Unfortunately, it can never be as fast as a set.elements() method -- the underlying d.keys() method has too many advantages (looping with an in-lined version of PyDict_Next(), knowing the size of the dictionary, writing with PyList_SET_ITEM, and having all steps in-lines with no intervening function calls).
Another bright idea is to support faster datatype conversion by adding an optional __len__() method to the iteration protocol so that list(), tuple(), dict(), and set() could allocate sufficient space for loading any iterable that knows its own length.
[Martin]
That is useful, also for list comprehension.
The advantages are faster type conversion (by avoiding resizing), keeping the APIs decoupled, and keeping the visible API thin. This disadvantage is that it clutters the C code with special case handling and that it doesn't work with generators or custom iterators (unless they add support for __len__).
I see no reason why it should not work for custom iterators. For generators, you typically don't know how many results you will get in the end, so it is no loss that you cannot specify that.
That makes sense. Looking at the code for list_fill, I see that some length checking is already done but only if the underlying object fills sq_length. I think that check should be replaced by a call to PyObject_Size(). That leaves a question as to how to best empower the dictionary constructor. If the source has an underlying dictionary (a Bag is a good example), then nothing beats PyDict_Copy(). For sets, that only works if you accept the default value of True. The set constructor has the same issue when the length of the iterable is knowable. The problem is that there is no analogue to PyList(n) which returns a presized collection. On a separate issue, does anyone care that dict.__init__() has update behavior instead of replace behavior like list.__init__()? Raymond
participants (4)
-
Guido van Rossum
-
Martin v. Loewis
-
Raymond Hettinger
-
Raymond Hettinger