Yeah, it is totally doable to refactor the collection ABCs to have something in between `Collection` and `Sequence` that just supports `__getitem__`.
But I would take Marco's research (and Inada's musings) seriously -- we don't actually want to support `__getitem__`, because of the unpredictable performance characteristics.
I'm no longer in favor of adding .ordered() -- I think it's better to add something to itertools, for example first() to get the first item (see Tim Peters' post), and something related to get the first N items.
On Sat, Aug 1, 2020 at 12:28 PM Christopher Barker email@example.com wrote:
On Fri, Jul 31, 2020 at 7:34 AM Guido van Rossum firstname.lastname@example.org wrote:
So maybe we need to add dict.ordered() which returns a view on the items that is a Sequence rather than a set? Or ordereditems(), orderedkeys() and orderedvalues()?
I'm still confused as to when "ordered" became synonymous with "Sequence" -- so wouldn't we want to call these dict.as_sequence() or something like that?
And is there a reason that the regular dict views couldn't be both a Set and a Sequence? Looking at the ABCs, I don't see a conflict -- __getitem__, index() and count() would need to be added, and Set's don't have any of those. (and count could be optimized to always return 0 or 1 for dict.keys() ;-) )
But anyway, naming aside, I'm still wondering whether we necessarily want the entire Sequence protocol. For the use cases at hand, isn't indexing and slicing enough?
Which brings us to the philosophy of duck typing. I wrote an earlier post about that -- so here's some follow up thoughts. I suggested that I like the "if I only need it to quack, I don't care if it's a duck" approach -- I try to use the quack() method, and I'm happy it if works, and raise an Exception (Or let whatever Exception happens be raised bubble up) if it doesn't.
Guido pointed out that having a quack() method isn't enough -- it also needs to actually behave as you expect -- which is the nice thing about ABCs -- if you know something is a Sequence, you don't just know that you can index it, you know that indexing it will do what you expect.
Which brings us back to the random.choice() function. It's really simple, and uses exactly the approach I outlined above.
def choice(self, seq): """Choose a random element from a non-empty sequence.""" try: i = self._randbelow(len(seq)) except ValueError: raise IndexError('Cannot choose from an empty sequence') from
None return seq[i]
It checks the length of the object, picks a random index within that length, and then tries to use that index to get a random item. so anything with a __len__ and a __getitem__ that accepts integers will work.
And this has worked "fine" for decades. Should it be checking that seq is actually a sequence? I don't think so -- I like that I can pass in any object that's indexable by an integer.
But there's is a potential problem here -- all it does is try to pass an integer to __getitem__. So all Sequences should work. But Mappings also have a __getitem__, but with slightly different semantics -- a Sequence should accept an integer (or object with an __index__) in the range of its size, but a Mapping can accept any valid key. So for the most part, passing a Mapping to random.choice() fails as it should, with a KeyError. But if you happen to have a key that is an integer, it might succeed, but it would not be doing "the right thing" (unless the Mapping happened to be constructed exactly the right way -- but then it should probably just be a Sequence).
So: do we need a solution to this? I don't think so, it's simply the nature of a dynamic typing as far as I'm concerned, but if we wanted it to be more robust, we could require (maybe only with a static type declaration) that the object passed in is a Sequence.
But I think that would be a shame -- this function doesn't need a full Sequence, it only needs a Sized and __getitem__.
In fact, the ABCs are designed to accommodate much of this -- for example, the Sized ABC only requires one feature: __len__. And Contains only __contains__. As far as I know there are no built-ins (or commonly used third party) objects that are ONLY Sized, or ONLY Contains. In fact, at least in the collection.abc, every ABC that has __contains__ also has __len__. And I can't think of anything that could support "in" that didn't have a size -- which could be a failure of imagination on my part. But you could type check for Contains is all you wanted to do was know that you could use it with "in".
So there are ABCs there simply to support a single method. Which means that we could solve the "problem" of random.choice with a "Getitemable" ABC.
Ahh -- but here's the rub -- while the ABCs only require certain methods -- in fact, it's implied that they have particular behavior as well. And this is the problem at hand. Both Sequences and Mappings have a __getitem__, but they have somewhat different meanings, and that meaning is embedded in the ABC itself, rather than the method: Sequences will take an integer, and raise a IndexError if its out of range, and Mappings take any hashable, and will raise a KeyError if it's not there.
So maybe what is needed is an Indexable ABC that implies the Sequence-like indexing behavior.
Then if we added indexing to dict views, they would be an Indexable, but not a Sequence.
On Fri, Jul 31, 2020 at 05:29 Ricky Teachey email@example.com wrote:
On Fri, Jul 31, 2020, 2:48 AM Wes Turner firstname.lastname@example.org wrote:
# Dicts and DataFrames
- (interactive Jupyter Notebook hosted by https://mybinder.org/ )
The punchline of Wes Turner's notebook (very well put together, thank you!) seems to partly be that if you find yourself wanting to work with the position of items in a dict, you might want to consider using a pandas.Series (with it's .iloc method).
A difficulty that immediately came to mind with this advice is type hinting support. I was just googling yesterday for "how to type hint using pandas" and the only thing I found is to use pd.Series and pd.DataFrame directly.
But those don't support type hinting comparable to:
class Vector(TypedDict): i: float j: float
This is a big downside of the advice "just use pandas". Although I love using pandas and use it all the time. _______________________________________________ Python-ideas mailing list -- email@example.com To unsubscribe send an email to firstname.lastname@example.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://email@example.com/message/C7HJFK... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile) _______________________________________________ Python-ideas mailing list -- firstname.lastname@example.org To unsubscribe send an email to email@example.com https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://firstname.lastname@example.org/message/VIPBHJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython