[Python-ideas] Re: Access (ordered) dict by index; insert slice

2 Aug 2020

      Yeah, it is totally doable to refactor the collection ABCs to have
something in between `Collection` and `Sequence` that just supports
`__getitem__`.

But I would take Marco's research (and Inada's musings) seriously -- we
don't actually want to support `__getitem__`, because of the unpredictable
performance characteristics.

I'm no longer in favor of adding .ordered() -- I think it's better to add
something to itertools, for example first() to get the first item (see Tim
Peters' post), and something related to get the first N items.

On Sat, Aug 1, 2020 at 12:28 PM Christopher Barker 
wrote:
...
On Fri, Jul 31, 2020 at 7:34 AM Guido van Rossum  wrote:
...
So maybe we need to add dict.ordered() which returns a view on the items
that is a Sequence rather than a set? Or ordereditems(), orderedkeys() and
orderedvalues()?
I'm still confused as to when "ordered" became synonymous with "Sequence"
-- so wouldn't we want to call these dict.as_sequence() or something like
that?
And is there a reason that the regular dict views couldn't be both a Set
and a Sequence? Looking at the ABCs, I don't see a conflict -- __getitem__,
index() and count() would need to be added, and  Set's don't have any of
those. (and count could be optimized to always return 0 or 1 for
dict.keys() ;-) )
But anyway, naming aside, I'm still wondering whether we necessarily want
the entire Sequence protocol. For the use cases at hand, isn't indexing and
slicing enough?
Which brings us to the philosophy of duck typing. I wrote an earlier post
about that -- so here's some follow up thoughts. I suggested that I like
the "if I only need it to quack, I don't care if it's a duck" approach -- I
try to use the quack() method, and I'm happy it if works, and raise an
Exception (Or let whatever Exception happens be raised bubble up) if it
doesn't.
Guido pointed out that having a quack() method isn't enough -- it also
needs to actually behave as you expect -- which is the nice thing about
ABCs -- if you know something is a Sequence, you don't just know that you
can index it, you know that indexing it will do what you expect.
Which brings us back to the random.choice() function. It's really simple,
and uses exactly the approach I outlined above.
def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        try:
            i = self._randbelow(len(seq))
        except ValueError:
            raise IndexError('Cannot choose from an empty sequence') from
None
        return seq[i]
It checks the length of the object, picks a random index within that
length, and then tries to use that index to get a random item. so anything
with a __len__ and a __getitem__ that accepts integers will work.
And this has worked "fine" for decades. Should it be checking that seq is
actually a sequence? I don't think so -- I like that I can pass in any
object that's indexable by an integer.
But there's is a potential problem here -- all it does is try to pass an
integer to __getitem__. So all Sequences should work. But Mappings also
have a __getitem__, but with slightly different semantics -- a Sequence
should accept an integer (or object with an __index__) in the range of its
size, but a Mapping can accept any valid key. So for the most part, passing
a Mapping to random.choice() fails as it should, with a KeyError. But if
you happen to have a key that is an integer, it might succeed, but it would
not be doing "the right thing" (unless the Mapping happened to be
constructed exactly the right way -- but then it should probably just be a
Sequence).
So: do we need a solution to this? I don't think so, it's simply the
nature of a dynamic typing as far as I'm concerned, but if we wanted it to
be more robust, we could require (maybe only with a static type
declaration) that the object passed in is a Sequence.
But I think that would be a shame -- this function doesn't need a full
Sequence, it only needs a Sized and __getitem__.
In fact, the ABCs are designed to accommodate much of this -- for example,
the Sized ABC only requires one feature: __len__. And Contains only
__contains__. As far as I know there are no built-ins (or commonly used
third party) objects that are ONLY Sized, or ONLY Contains. In fact, at
least in the collection.abc, every ABC that has __contains__ also has
__len__. And I can't think of anything that could support "in" that didn't
have a size -- which could be a failure of imagination on my part. But you
could type check for Contains is all you wanted to do was know that you
could use it with "in".
So there are ABCs there simply to support a single method. Which means
that we could solve the "problem" of random.choice with a "Getitemable"
ABC.
Ahh -- but here's the rub -- while the ABCs only require certain methods
-- in fact, it's implied that they have particular behavior as well. And
this is the problem at hand. Both Sequences and Mappings have a
__getitem__, but they have somewhat different meanings, and that meaning is
embedded in the ABC itself, rather than the method: Sequences will take an
integer, and raise a IndexError if its out of range, and Mappings take any
hashable, and will raise a KeyError if it's not there.
So maybe what is needed is an Indexable ABC that implies the Sequence-like
indexing behavior.
Then if we added indexing to dict views, they would be an Indexable, but
not a Sequence.
-CHB
...
On Fri, Jul 31, 2020 at 05:29 Ricky Teachey  wrote:
...
On Fri, Jul 31, 2020, 2:48 AM Wes Turner  wrote:
...
# Dicts and DataFrames
- Src:
https://github.com/westurner/pythondictsanddataframes/blob/master/dicts_and_...
- Binder:
https://mybinder.org/v2/gh/westurner/pythondictsanddataframes/master?filepat...
  - (interactive Jupyter Notebook hosted by https://mybinder.org/ )
The punchline of Wes Turner's notebook (very well put together, thank
you!) seems to partly be that if you find yourself wanting to work with the
position of items in a dict, you might want to consider using a
pandas.Series (with it's .iloc method).
A difficulty that immediately came to mind with this advice is type
hinting support. I was just googling yesterday for "how to type hint using
pandas" and the only thing I found is to use pd.Series and pd.DataFrame
directly.
But those don't support type hinting comparable to:
Dict[str, float]
Or:
class Vector(TypedDict):
    i: float
    j: float
This is a big downside of the advice "just use pandas". Although I love
using pandas and use it all the time.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/C7HJFK...
Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido (mobile)
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/VIPBHJ...
Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, PhD
Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...