An ABC representing "Iterable, Sized, Container"

At Dropbox I see a lot of people who want to start using type annotations (PEP 484, mypy) struggle with the collection ABCs. It's pretty common to want to support both sets and lists of values, as these have a lot of useful behavior: they are (re)iterable, have a size, and implement `__contains__` (i.e. x in c, x not in c). It's pretty common for people to think that Iterable is the answer, but it's not. I'm beginning to think that it would be useful to add another ABC to collections.abc (and to typing) that represents the intersection of these three ABCs, and to "upgrade" Set and Sequence to inherit from it. (And Mapping.) Thoughts? (Another useful concept is "reiterable", i.e. an Iterable that can be iterated over multiple times -- there's no ABC to indicate this concept. Sequence, Set and Mapping all support this concept, Iterator does not include it, but Iterable is wishy-washy.) -- --Guido van Rossum (python.org/~guido)

On Thu, Jul 21, 2016 at 4:43 PM Guido van Rossum <guido@python.org> wrote:
Are these folks dissatisfied with a union type: ``Union[Sequence, Set]``? (Another useful concept is "reiterable", i.e. an Iterable that can be
"Reiterable" seems to cause vocabulary confusion regularly on this list. It also has a common case for runtime checking: if not isinstance(iterable, Reiterable): iterable = list(iterable) It's hard to think of a good name to describe ``Union[Iterable, Sized, Container]`` and it'd still have the confusion of whether it's re-iterable. Does the concept of "reiterable" imply that an iteration does not mutate the object? If so, then if something is both Reiterable and Sized, it should also be a Container.

On Thu, Jul 21, 2016 at 2:14 PM, Michael Selik <michael.selik@gmail.com> wrote:
I don't know if they are, but I am. :-) Unions look pretty ugly, and there are obvious implementations that satisfy all three criteria but are neither sequence not set (e.g. mapping).
What kind of confusion?
It's hard to imagine it having a size and not being reiterable. FWIW IIUC the most common idiom to test for this is: if iter(iterable) is iterable: iterable = list(iterable)
Does the concept of "reiterable" imply that an iteration does not mutate the object?
I think that's generally true, though it's not the key property. The key property is that after iterating until the end, you can start over from the beginning just by calling iter() again. Example types that don't have this property include streams and anything that is itself the result of calling iter() on something more stable.
If so, then if something is both Reiterable and Sized, it should also be a Container.
Agreed. (Container is not implied by Iterable mostly because for a once-iterable, "x in c" consumes the iterable.) -- --Guido van Rossum (python.org/~guido)

On Thu, Jul 21, 2016 at 5:29 PM Guido van Rossum <guido@python.org> wrote:
Someone accidentally says iterable when they mean reiterable, and then there's a handful of emails pointing that out and arguing about what the right word should be to describe something (like range) that can be iterated over more than once, but isn't necessarily a sequence, set, or mapping.

But there's already Container which means "supports __contains__". Collection might cause confusing with the module name collections. Otherwise either would be a good candidate... On Thu, Jul 21, 2016 at 4:51 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thu, Jul 21, 2016 at 5:05 PM, Thomas Nyberg <tomuxiong@gmail.com> wrote:
Neither "static" nor "const" convey the right meaning.
Personally I think Reiterable is about as clear as it ever will be...
Yeah, I think that's my conclusion as well. -- --Guido van Rossum (python.org/~guido)

On 22 July 2016 at 10:29, Guido van Rossum <guido@python.org> wrote:
As far as the original "Sized Iterable Container" question goes, would FiniteContainer work? - it's countable/iterable - it has a quantifiable size - you can do membership tests on it Countable & quantifiable = finite, membership testing = container, hence FiniteContainer
With the semantics being "iter(x) is not x"? That seems reasonable to me, as I spent some time think about whether or not this is a distinct idea from the "Sized Iterable Container" question, and it's pretty easy to demonstrate that it is: >>> class ReiterableCounter: ... def __iter__(self): ... x = 0 ... while True: ... yield x ... x += 1 ... >>> ctr = ReiterableCounter() >>> for x in ctr: ... print(x) ... if x > 1: break ... 0 1 2 >>> for x in ctr: ... print(x) ... if x > 1: break ... 0 1 2 (The same example works by returning any non-terminating iterator from __iter__) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Yeah, I know that you can define something that's Iterable and Container but not Sized. But has anyone ever done that outside a test or demonstration? And PEP 484 can't talk about "this type except if it's that type", so the operational definition "Iterable except Iterator" can't be expressed as a type. If PEP 484 had the concept of Intersection type, I'd probably define X = Intersection[Sized, Iterable, Container]. But even then I'd like a name that rolls a little easier off the tongue than FiniteContainer. If it really must be two words, how about SizedIterable? That suggests it's a subclass of Sized and Iterable, which it is. Container doesn't need separate mention, it's pretty obvious that any Iterable can implement __contains__, and more people know that Iterable is an ABC than Container. The new ABC could define a default concrete __contains__ implementation: class SizedIterable(Sized, Iterable, Container): def __contains__(self, value): # Same as Sequence for v in self: if v == value: return True return False TBH if I had known about ABCs and PEP 484 when we defined the collections ABCs, I probably would have defined it like that on Iterable as well, and completely done away with Container as a separate one-trick pony ABC. Set and Mapping would just override the default implementation. (Hm... not sure how the default Set.__contains__ would work though. Can you make a concrete method abstract in a subclass?) On Thu, Jul 21, 2016 at 7:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 22 July 2016 at 14:04, Guido van Rossum <guido@python.org> wrote:
SizedIterable works for me, as it's the __len__ that drives most of the other interesting properties beyond an arbitrary iterable (like using iteration for the default containment check without worrying about consuming a one-shot iterator or creating an infinite loop). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Jul 21, 2016 at 9:04 PM, Guido van Rossum <guido@python.org> wrote:
In my experiments with type annotations, we recently encountered this exact same issue (variables for which either sets or lists should be valid). My initial solution was almost exactly what you propose here: we wrote a SizedIterable class, simply inheriting from Sized and Iterable. Ultimately, we didn't find this very satisfying, because we realized that strings are SizedIterables (they're sequences, too), and we really didn't want to allow accidentally passing strings that would be interpreted as iterables of characters. Unfortunately, I don't think there's any good way in Python to specify "any sized iterable that isn't a string".

This is a bit of a tangent, hopefully convergent, but hey, this is python-ideas: Would there be interest in some kind of method/API for restarting iterators? If there was a it.restart() (or reset(it)), it would probably be a good idea to name the concept in the same way as the method ( "Restartable" or "Resettable"). And having that as an actual protocol would simplify the discussion on both what the type should contain and how it should be named, while having actal applications to simplify code (and reduce memory usage in cases when creating a list is not needed, just making multiple passes). I'm not proposing one of those names in particular, discussing whether this concept makes sense and is useful should be before the naming On Fri, Jul 22, 2016 at 3:44 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset

On Sat, Jul 23, 2016 at 12:20 AM, Daniel Moisset <dmoisset@machinalis.com> wrote:
IMO, no. Some iterators can be restarted by going back to the original iterable and requesting another iterator, but with no guarantee that it will result in the exact same sequence (eg dict/set iterators). Does that count as restarting? And what if you start by consuming a header or two, then pass the resulting iterator to another function. Based on type, it would be restartable (say it's a list_iterator), but the called function would be highly surprised to suddenly get back more results. What you might be looking at is a protocol for "bookmarking" or "forking" an iterator. That might be more useful. For example: # Like itertools.cycle() def cycle(it): it = iter(it) while True: yield from fork(it) # and then, in effect, "restart" it With list_iterator, forking is straight-forward - make another iterator off the same list, at the same point this one is. Compared to the normal implementation of itertools.cycle, this wouldn't need any auxiliary storage, as it'd just reference the base list every time. It's kinda like restarting the iterator, but if you give this a partly-consumed list_iterator, it would cycle through the remaining elements only. (How many iterators would be forkable, I don't know, but I expect it'd be the exact same ones that would be restartable by any other definition.) ChrisA

On Fri, Jul 22, 2016 at 8:48 AM, Chris Angelico <rosuav@gmail.com> wrote:
Sequences don't give you this *guarantee* either. A trivial example: class MyList(list): def __getitem__(self, ndx): # In "real world" this might be meaningful condition and update if random() < .333: if isinstance(ndx, slice): for n in range(ndx.start, ndx.stop, ndx.step or 1): self[n] += 1 else: self[ndx] += 1 return super().__getitem__(ndx)
What you might be looking at is a protocol for "bookmarking" or "forking" an iterator. That might be more useful. For example:
How is this different from itertools.tee()? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Jul 23, 2016 at 3:58 AM, David Mertz <mertz@gnosis.cx> wrote:
Functionally? Not at all. In fact, itertools.tee could make use of the fork protocol. The idea would be efficiency; iterators that are "restartable" by your definition would be "forkable", which would effectively permit tee to just ask the iterator to tee itself. The simple and naive implementation of tee() is "save all the elements until they get asked for", which requires extra storage for those elements; if your iterator can fork, that implies that you can get those elements right back again from the original. Imagine a list_iterator consists of two things, a base list and a current index. Tee'ing that iterator means retrieving elements and hanging onto them until needed; forking means creating another iterator with the same index. Which means that a forked iterator would reflect changes to the original list, for better or for worse. Not saying it's necessarily a good thing, but it's a more general version of the concept of "restarting" an iterator, being more compatible with trimmed iterators and such. Ultimately, your "restart" is no different from itertools.tee either. ChrisA

Can someone create a bug and a patch to add SizedIterable to collections.abc and typing in 3.6?

On Fri, 22 Jul 2016 at 12:09 Guido van Rossum <guido@python.org> wrote:
Can someone create a bug
Sure: http://bugs.python.org/issue27598
and a patch to add SizedIterable to collections.abc and typing in 3.6?
That's something I will leave up to someone else as I have file system path stuff to do. :) -Brett

On Sun, Jul 24, 2016 at 10:15 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:
Does it work? For what categories of iterators? Surely not for streams...
I tried it with list and set iterators and it worked fine. With a generator, it threw an exception (though not something tidy like UncopyableObjectException - it complained about calling object.__new__(generator) instead of generator.__new__). Presumably it'll fail, one way or another, with non-duplicable objects (eg an open file, which is its own iterator, simply raises TypeError "cannot serialize"); those are non-restartable, non-bookmarkable, non-forkable iterators (take your pick of terminology). ChrisA

On Sun, Jul 24, 2016 at 09:48:44AM +1000, Chris Angelico wrote:
On Sun, Jul 24, 2016 at 2:57 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
There is such a protocol. Use copy.copy().
Time machine strikes again! Restarting of iterators exists.
You can copy *some* iterators, but it doesn't restart them. py> import copy py> it = iter([1, 2, 3]) py> next(it) 1 py> next(it) 2 py> a = copy.copy(it) py> next(a) 3 py> next(it) 3 I wouldn't expect that copying an iterator would, in general, restart it. In general, there's no way to restart an iterator. Once seen, values are gone and cannot be recreated. -- Steve

On Thu, Jul 21, 2016 at 11:45 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Given how often the str problem comes up in this context, I'm thinking a common special case in type annotation checkers that explicitly exclude str from consideration for iteration. It's behavior is rarely what the programmer intended and more likely to be a bug. -gps

On Mon, Jul 25, 2016 at 10:19 AM, Gregory P. Smith <greg@krypto.org> wrote:
Should we have IterableButNotString, CollectionExceptString, NonStringSequence, or all three? I suppose the runtime check would be e.g. isinstance(x, Iterable) and not isinstance(x, (str, bytes, bytearray, memoryview))? And the type checker should special-case the snot out of it? -- --Guido van Rossum (python.org/~guido)

Guido, I'm not entirely sure if you're joking or not, but if we'll start with this, we will never finish. I believe that although str/bytes truly are special regarding iterables, the combinations are endless. Soon enough we'll have NotBytesSequence, NumericSequenceWithoutFloat and BuiltinNotCustomMadeIterable. We cannot guarantee a "meaning". We can however guarantee an interface or API, in this case __iter__, and promote duck-typing in order to support even the craziest custom iterables. I think breaking the design or the state-of-mind in case of " IterableButNotString" would be a mistake, unless we're planning to remove str's __iter__ altogether (which I don't support either but who knows). Nevertheless, +1 for the unification of Iterable, Sized and Container or "Reiterables" as it declares a clearer interface. On Mon, Jul 25, 2016 at 9:03 PM Guido van Rossum <guido@python.org> wrote:

I'm sorry, I was using Socratic questions to get people to think mount about the name and semantics for this one type. I certainly don't think we should have more of these! At least not until we've added an "except" type to PEP 484. On Mon, Jul 25, 2016 at 2:29 PM, Bar Harel <bzvi7919@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I can't recall which, but I remember reading about a language with a relative complement operator on types. The idea is, where T is a set of types, then: T \ {A, B, ...} would be all types in T *except* A, B, .... I think maybe there could be a type variable Except that could be used like: Except[T, A, B, ...] where T is a type variable. (Other name ideas: Complement, Difference). -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Jul 25, 2016 6:58 PM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:

On 26 July 2016 at 03:58, Guido van Rossum <guido@python.org> wrote:
For cases where str instances are handled differently from other iterables, but still within the same function, a convention of writing "Union[str, Iterable]" as the input parameter type may suffice - while technically the union is redundant, the implication would be that str instances are treated as str objects, rather than as an iterable of length-1 str objects. The other case where this would come up is in signature overloading, where one overload may be typed as "str" and the other as "Iterable". Presumably in those cases the more specific type already wins. Anything more than that would need to be defined in the context of a particular algorithm, such as the oft-requested generic "flatten" operation, where you typically want to consider types like str, bytes, bytearray and memoryview as atomic objects, rather than as containers, but then things get blurry around iterable non-builtin types like array.array, numpy.ndarray, Pandas data frames, image formats with pixel addressing, etc, as well as builtins with multiple iteration behaviours like dict. One possible way to tackle that would be to declare a new collections.abc.AtomicIterable ABC, as well as a collections.flatten operation defined as something like: def flatten(iterable): for item in iterable: if isinstance(item, AtomicIterable): yield item continue if isinstance(item, Mapping): yield from item.items() continue try: subiter = iter(item) except TypeError: yield item else: yield from flatten(subiter) Over time, different types would get explicitly registered with AtomicIterable based on what their developers considered the most appropriate behaviour to be when asked to flatten them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I like your idea with AtomicIterable, but I still don't think it fully solves the problem. How can one categorize array.array? Consider an array of bytes. Sometimes you'll want to iterate over them as ints sometimes as 1 byte objects. In one case they can be used in order to conserve space as a sequence of ints in which case you'll want them one by one, and in the other case they'll be atomic just like bytes. Giving the register option on the other hand to the end user would pose a different problem - ABC.register is global. If you're developing a module and register array.array as an AtomicIterable, you might affect other parts of the program in unexpected ways. I think it's a tough one to solve :-/ On Tue, Jul 26, 2016, 7:52 AM Nick Coghlan <ncoghlan@gmail.com> wrote:

The problem here is that often folks want to NOT allow single strings, but rather want only an iterable of strings. But, of course, a single string is an iterable of strings... I'm pretty sure this is by far the most common dynamic typing gotcha in Python ( at least after true division). Which means it may well be worth special casing all over the place to make it easy to do right. Personally, I'd like a solution that helps folks write correct code even without a type checker. AFAIK, the "solution" at this point is for every function that takes an iterable of strings to have code like: in = [in] if type(in) is str else in But this is ugly and fragile in so many ways... Perhaps there should be (maybe only in my private lib) a "as_sequence_of_strings()" function that does this, but more completely and robustly. Similar to numpy's asarray(), which passes back the object of it's already an array of the type specified, and tries to convert it otherwise.
typically want to consider types like str, bytes, bytearray and memoryview as atomic objects, rather than as containers,
There are all kinds of things that one might want to treat as atomic, rather than a sequence -- but what is special about str is that it is infinitely recurs able---no matter how many times you index it, you always get back a sequence -- never an atomic object. Most (every?) other sequence returns an atomic object (eventually) when you index it. So maybe the solution really is to have a character type -- then a string is a sequence of characters, rather that a sequence of string, and all these problems go away. If a character object has all the methods of a string except indexing (and iterating) it might even be mostly backward compatible.... Otherwise, special casing in multiple cases may be the only option. Or maybe all we need is a way to spell "any iterable of strings except a string", though that would only help type checking. But either way, it would be great to support this better. -CHB

On Fri, Jul 29, 2016, 2:18 PM Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
The problem here is that often folks want to NOT allow single strings, but rather want only an iterable of strings.
The module author might, but sometimes the user might want to use a str as an iterable of 1-len strs. It bugs me that Pandas forces me to wrap a str of indices in a list constructor before passing it into the DataFrame constructor.

I agree, the fact that a string consists of strings has bothered me as well. It makes some things work well but some recursive algorithms requires special checks. There should be a way to iterate over a string as characters. AtomicIterable seems a good idea, but maybe it could be just the conceptual starting point of building up a class hierarchy of "iterability", and such a class could be passed to the 'flatten' method proposed above in the thread by Nick Coghlan. Some conceptual inspiration may be taken from numpy where one can have 0-dimensional arrays which are different from scalars - and different from n-D array with 1 element (n > 0) ``` In [7]: x = np.ndarray(()) In [8]: x Out[8]: array(21.0) In [9]: type(x) Out[9]: numpy.ndarray In [10]: x[()] Out[10]: 21.0 In [11]: np.asscalar(x) Out[11]: 21.0 In [13]: x = np.ndarray((1,)) In [14]: x Out[14]: array([ 17.5]) ```

I like this proposal from a book-keeping perspective... the interface provided by sequences/mappings/sets is definitely different than the interface provided by generic iterables (mostly due to the fact that the former group can be iterated repeatedly). With that said, I'm not really sure what how the `Reiterable` (or whatever suitable name the smart people around here come up with) class would be implemented. class Reiterable(Iterable): pass This seems a bit weird. All of the other ABCs make some guarantees. If I fill out a set of methods then my object will be a <fill in ABC>. There doesn't seem to be a set of methods that exist to distinguish between an object that is Iterable and an object that is Reiterable. I suppose that the counter argument is that we always assume that the methods on the object are implemented "reasonably". Perhaps the main goal for this class is just book-keeping/annotating/loose "if you say so" typechecking and we're trusting that if someone subclasses `Reiterable`, they're going to implement it in such a way that it indeed can be iterated multiple times and that they didn't just mean `Iterable`. If that's the intent, I think that is also fine with me -- it just seemed like someone ought to mention it once... On Thu, Jul 21, 2016 at 5:29 PM, Guido van Rossum <guido@python.org> wrote:
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!

On Thu, Jul 21, 2016 at 01:43:01PM -0700, Guido van Rossum wrote:
Are you talking about giving a nice name to Union[Set, Sequence, Mapping]? I'm not sure that there is one, unless we make up a word. How do you feel about abbreviations? SSM? MSS?
Now I'm having flash-backs to this thread: https://mail.python.org/pipermail/python-ideas/2013-September/023239.html As I recall from that thread: isinstance(obj, Reiterable) is *almost* equivalent to: isinstance(obj, Iterable) and not isinstance(obj, Iterator) Back at the time, I wasn't convinced that the stdlib needed a special name for this concept, but I wasn't thinking of the typing module. -- Steve

On Thursday, July 21, 2016 at 1:44:11 PM UTC-7, Guido van Rossum wrote:
+1 on this, from the pytype side of things. pytype, during type-inference, always struggles to give names to things, and having additional structural types around helps. Also, I like "Collection" better than Reiterable, since the latter sounds too intricate, possibly scaring users away.

On Thu, Jul 21, 2016 at 4:43 PM Guido van Rossum <guido@python.org> wrote:
Are these folks dissatisfied with a union type: ``Union[Sequence, Set]``? (Another useful concept is "reiterable", i.e. an Iterable that can be
"Reiterable" seems to cause vocabulary confusion regularly on this list. It also has a common case for runtime checking: if not isinstance(iterable, Reiterable): iterable = list(iterable) It's hard to think of a good name to describe ``Union[Iterable, Sized, Container]`` and it'd still have the confusion of whether it's re-iterable. Does the concept of "reiterable" imply that an iteration does not mutate the object? If so, then if something is both Reiterable and Sized, it should also be a Container.

On Thu, Jul 21, 2016 at 2:14 PM, Michael Selik <michael.selik@gmail.com> wrote:
I don't know if they are, but I am. :-) Unions look pretty ugly, and there are obvious implementations that satisfy all three criteria but are neither sequence not set (e.g. mapping).
What kind of confusion?
It's hard to imagine it having a size and not being reiterable. FWIW IIUC the most common idiom to test for this is: if iter(iterable) is iterable: iterable = list(iterable)
Does the concept of "reiterable" imply that an iteration does not mutate the object?
I think that's generally true, though it's not the key property. The key property is that after iterating until the end, you can start over from the beginning just by calling iter() again. Example types that don't have this property include streams and anything that is itself the result of calling iter() on something more stable.
If so, then if something is both Reiterable and Sized, it should also be a Container.
Agreed. (Container is not implied by Iterable mostly because for a once-iterable, "x in c" consumes the iterable.) -- --Guido van Rossum (python.org/~guido)

On Thu, Jul 21, 2016 at 5:29 PM Guido van Rossum <guido@python.org> wrote:
Someone accidentally says iterable when they mean reiterable, and then there's a handful of emails pointing that out and arguing about what the right word should be to describe something (like range) that can be iterated over more than once, but isn't necessarily a sequence, set, or mapping.

But there's already Container which means "supports __contains__". Collection might cause confusing with the module name collections. Otherwise either would be a good candidate... On Thu, Jul 21, 2016 at 4:51 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thu, Jul 21, 2016 at 5:05 PM, Thomas Nyberg <tomuxiong@gmail.com> wrote:
Neither "static" nor "const" convey the right meaning.
Personally I think Reiterable is about as clear as it ever will be...
Yeah, I think that's my conclusion as well. -- --Guido van Rossum (python.org/~guido)

On 22 July 2016 at 10:29, Guido van Rossum <guido@python.org> wrote:
As far as the original "Sized Iterable Container" question goes, would FiniteContainer work? - it's countable/iterable - it has a quantifiable size - you can do membership tests on it Countable & quantifiable = finite, membership testing = container, hence FiniteContainer
With the semantics being "iter(x) is not x"? That seems reasonable to me, as I spent some time think about whether or not this is a distinct idea from the "Sized Iterable Container" question, and it's pretty easy to demonstrate that it is: >>> class ReiterableCounter: ... def __iter__(self): ... x = 0 ... while True: ... yield x ... x += 1 ... >>> ctr = ReiterableCounter() >>> for x in ctr: ... print(x) ... if x > 1: break ... 0 1 2 >>> for x in ctr: ... print(x) ... if x > 1: break ... 0 1 2 (The same example works by returning any non-terminating iterator from __iter__) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Yeah, I know that you can define something that's Iterable and Container but not Sized. But has anyone ever done that outside a test or demonstration? And PEP 484 can't talk about "this type except if it's that type", so the operational definition "Iterable except Iterator" can't be expressed as a type. If PEP 484 had the concept of Intersection type, I'd probably define X = Intersection[Sized, Iterable, Container]. But even then I'd like a name that rolls a little easier off the tongue than FiniteContainer. If it really must be two words, how about SizedIterable? That suggests it's a subclass of Sized and Iterable, which it is. Container doesn't need separate mention, it's pretty obvious that any Iterable can implement __contains__, and more people know that Iterable is an ABC than Container. The new ABC could define a default concrete __contains__ implementation: class SizedIterable(Sized, Iterable, Container): def __contains__(self, value): # Same as Sequence for v in self: if v == value: return True return False TBH if I had known about ABCs and PEP 484 when we defined the collections ABCs, I probably would have defined it like that on Iterable as well, and completely done away with Container as a separate one-trick pony ABC. Set and Mapping would just override the default implementation. (Hm... not sure how the default Set.__contains__ would work though. Can you make a concrete method abstract in a subclass?) On Thu, Jul 21, 2016 at 7:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On 22 July 2016 at 14:04, Guido van Rossum <guido@python.org> wrote:
SizedIterable works for me, as it's the __len__ that drives most of the other interesting properties beyond an arbitrary iterable (like using iteration for the default containment check without worrying about consuming a one-shot iterator or creating an infinite loop). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Jul 21, 2016 at 9:04 PM, Guido van Rossum <guido@python.org> wrote:
In my experiments with type annotations, we recently encountered this exact same issue (variables for which either sets or lists should be valid). My initial solution was almost exactly what you propose here: we wrote a SizedIterable class, simply inheriting from Sized and Iterable. Ultimately, we didn't find this very satisfying, because we realized that strings are SizedIterables (they're sequences, too), and we really didn't want to allow accidentally passing strings that would be interpreted as iterables of characters. Unfortunately, I don't think there's any good way in Python to specify "any sized iterable that isn't a string".

This is a bit of a tangent, hopefully convergent, but hey, this is python-ideas: Would there be interest in some kind of method/API for restarting iterators? If there was a it.restart() (or reset(it)), it would probably be a good idea to name the concept in the same way as the method ( "Restartable" or "Resettable"). And having that as an actual protocol would simplify the discussion on both what the type should contain and how it should be named, while having actal applications to simplify code (and reduce memory usage in cases when creating a list is not needed, just making multiple passes). I'm not proposing one of those names in particular, discussing whether this concept makes sense and is useful should be before the naming On Fri, Jul 22, 2016 at 3:44 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
-- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset

On Sat, Jul 23, 2016 at 12:20 AM, Daniel Moisset <dmoisset@machinalis.com> wrote:
IMO, no. Some iterators can be restarted by going back to the original iterable and requesting another iterator, but with no guarantee that it will result in the exact same sequence (eg dict/set iterators). Does that count as restarting? And what if you start by consuming a header or two, then pass the resulting iterator to another function. Based on type, it would be restartable (say it's a list_iterator), but the called function would be highly surprised to suddenly get back more results. What you might be looking at is a protocol for "bookmarking" or "forking" an iterator. That might be more useful. For example: # Like itertools.cycle() def cycle(it): it = iter(it) while True: yield from fork(it) # and then, in effect, "restart" it With list_iterator, forking is straight-forward - make another iterator off the same list, at the same point this one is. Compared to the normal implementation of itertools.cycle, this wouldn't need any auxiliary storage, as it'd just reference the base list every time. It's kinda like restarting the iterator, but if you give this a partly-consumed list_iterator, it would cycle through the remaining elements only. (How many iterators would be forkable, I don't know, but I expect it'd be the exact same ones that would be restartable by any other definition.) ChrisA

On Fri, Jul 22, 2016 at 8:48 AM, Chris Angelico <rosuav@gmail.com> wrote:
Sequences don't give you this *guarantee* either. A trivial example: class MyList(list): def __getitem__(self, ndx): # In "real world" this might be meaningful condition and update if random() < .333: if isinstance(ndx, slice): for n in range(ndx.start, ndx.stop, ndx.step or 1): self[n] += 1 else: self[ndx] += 1 return super().__getitem__(ndx)
What you might be looking at is a protocol for "bookmarking" or "forking" an iterator. That might be more useful. For example:
How is this different from itertools.tee()? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Jul 23, 2016 at 3:58 AM, David Mertz <mertz@gnosis.cx> wrote:
Functionally? Not at all. In fact, itertools.tee could make use of the fork protocol. The idea would be efficiency; iterators that are "restartable" by your definition would be "forkable", which would effectively permit tee to just ask the iterator to tee itself. The simple and naive implementation of tee() is "save all the elements until they get asked for", which requires extra storage for those elements; if your iterator can fork, that implies that you can get those elements right back again from the original. Imagine a list_iterator consists of two things, a base list and a current index. Tee'ing that iterator means retrieving elements and hanging onto them until needed; forking means creating another iterator with the same index. Which means that a forked iterator would reflect changes to the original list, for better or for worse. Not saying it's necessarily a good thing, but it's a more general version of the concept of "restarting" an iterator, being more compatible with trimmed iterators and such. Ultimately, your "restart" is no different from itertools.tee either. ChrisA

Can someone create a bug and a patch to add SizedIterable to collections.abc and typing in 3.6?

On Fri, 22 Jul 2016 at 12:09 Guido van Rossum <guido@python.org> wrote:
Can someone create a bug
Sure: http://bugs.python.org/issue27598
and a patch to add SizedIterable to collections.abc and typing in 3.6?
That's something I will leave up to someone else as I have file system path stuff to do. :) -Brett

On Sun, Jul 24, 2016 at 10:15 AM, Guido van Rossum <gvanrossum@gmail.com> wrote:
Does it work? For what categories of iterators? Surely not for streams...
I tried it with list and set iterators and it worked fine. With a generator, it threw an exception (though not something tidy like UncopyableObjectException - it complained about calling object.__new__(generator) instead of generator.__new__). Presumably it'll fail, one way or another, with non-duplicable objects (eg an open file, which is its own iterator, simply raises TypeError "cannot serialize"); those are non-restartable, non-bookmarkable, non-forkable iterators (take your pick of terminology). ChrisA

On Sun, Jul 24, 2016 at 09:48:44AM +1000, Chris Angelico wrote:
On Sun, Jul 24, 2016 at 2:57 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
There is such a protocol. Use copy.copy().
Time machine strikes again! Restarting of iterators exists.
You can copy *some* iterators, but it doesn't restart them. py> import copy py> it = iter([1, 2, 3]) py> next(it) 1 py> next(it) 2 py> a = copy.copy(it) py> next(a) 3 py> next(it) 3 I wouldn't expect that copying an iterator would, in general, restart it. In general, there's no way to restart an iterator. Once seen, values are gone and cannot be recreated. -- Steve

On Thu, Jul 21, 2016 at 11:45 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Given how often the str problem comes up in this context, I'm thinking a common special case in type annotation checkers that explicitly exclude str from consideration for iteration. It's behavior is rarely what the programmer intended and more likely to be a bug. -gps

On Mon, Jul 25, 2016 at 10:19 AM, Gregory P. Smith <greg@krypto.org> wrote:
Should we have IterableButNotString, CollectionExceptString, NonStringSequence, or all three? I suppose the runtime check would be e.g. isinstance(x, Iterable) and not isinstance(x, (str, bytes, bytearray, memoryview))? And the type checker should special-case the snot out of it? -- --Guido van Rossum (python.org/~guido)

Guido, I'm not entirely sure if you're joking or not, but if we'll start with this, we will never finish. I believe that although str/bytes truly are special regarding iterables, the combinations are endless. Soon enough we'll have NotBytesSequence, NumericSequenceWithoutFloat and BuiltinNotCustomMadeIterable. We cannot guarantee a "meaning". We can however guarantee an interface or API, in this case __iter__, and promote duck-typing in order to support even the craziest custom iterables. I think breaking the design or the state-of-mind in case of " IterableButNotString" would be a mistake, unless we're planning to remove str's __iter__ altogether (which I don't support either but who knows). Nevertheless, +1 for the unification of Iterable, Sized and Container or "Reiterables" as it declares a clearer interface. On Mon, Jul 25, 2016 at 9:03 PM Guido van Rossum <guido@python.org> wrote:

I'm sorry, I was using Socratic questions to get people to think mount about the name and semantics for this one type. I certainly don't think we should have more of these! At least not until we've added an "except" type to PEP 484. On Mon, Jul 25, 2016 at 2:29 PM, Bar Harel <bzvi7919@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

I can't recall which, but I remember reading about a language with a relative complement operator on types. The idea is, where T is a set of types, then: T \ {A, B, ...} would be all types in T *except* A, B, .... I think maybe there could be a type variable Except that could be used like: Except[T, A, B, ...] where T is a type variable. (Other name ideas: Complement, Difference). -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Jul 25, 2016 6:58 PM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:

On 26 July 2016 at 03:58, Guido van Rossum <guido@python.org> wrote:
For cases where str instances are handled differently from other iterables, but still within the same function, a convention of writing "Union[str, Iterable]" as the input parameter type may suffice - while technically the union is redundant, the implication would be that str instances are treated as str objects, rather than as an iterable of length-1 str objects. The other case where this would come up is in signature overloading, where one overload may be typed as "str" and the other as "Iterable". Presumably in those cases the more specific type already wins. Anything more than that would need to be defined in the context of a particular algorithm, such as the oft-requested generic "flatten" operation, where you typically want to consider types like str, bytes, bytearray and memoryview as atomic objects, rather than as containers, but then things get blurry around iterable non-builtin types like array.array, numpy.ndarray, Pandas data frames, image formats with pixel addressing, etc, as well as builtins with multiple iteration behaviours like dict. One possible way to tackle that would be to declare a new collections.abc.AtomicIterable ABC, as well as a collections.flatten operation defined as something like: def flatten(iterable): for item in iterable: if isinstance(item, AtomicIterable): yield item continue if isinstance(item, Mapping): yield from item.items() continue try: subiter = iter(item) except TypeError: yield item else: yield from flatten(subiter) Over time, different types would get explicitly registered with AtomicIterable based on what their developers considered the most appropriate behaviour to be when asked to flatten them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I like your idea with AtomicIterable, but I still don't think it fully solves the problem. How can one categorize array.array? Consider an array of bytes. Sometimes you'll want to iterate over them as ints sometimes as 1 byte objects. In one case they can be used in order to conserve space as a sequence of ints in which case you'll want them one by one, and in the other case they'll be atomic just like bytes. Giving the register option on the other hand to the end user would pose a different problem - ABC.register is global. If you're developing a module and register array.array as an AtomicIterable, you might affect other parts of the program in unexpected ways. I think it's a tough one to solve :-/ On Tue, Jul 26, 2016, 7:52 AM Nick Coghlan <ncoghlan@gmail.com> wrote:

The problem here is that often folks want to NOT allow single strings, but rather want only an iterable of strings. But, of course, a single string is an iterable of strings... I'm pretty sure this is by far the most common dynamic typing gotcha in Python ( at least after true division). Which means it may well be worth special casing all over the place to make it easy to do right. Personally, I'd like a solution that helps folks write correct code even without a type checker. AFAIK, the "solution" at this point is for every function that takes an iterable of strings to have code like: in = [in] if type(in) is str else in But this is ugly and fragile in so many ways... Perhaps there should be (maybe only in my private lib) a "as_sequence_of_strings()" function that does this, but more completely and robustly. Similar to numpy's asarray(), which passes back the object of it's already an array of the type specified, and tries to convert it otherwise.
typically want to consider types like str, bytes, bytearray and memoryview as atomic objects, rather than as containers,
There are all kinds of things that one might want to treat as atomic, rather than a sequence -- but what is special about str is that it is infinitely recurs able---no matter how many times you index it, you always get back a sequence -- never an atomic object. Most (every?) other sequence returns an atomic object (eventually) when you index it. So maybe the solution really is to have a character type -- then a string is a sequence of characters, rather that a sequence of string, and all these problems go away. If a character object has all the methods of a string except indexing (and iterating) it might even be mostly backward compatible.... Otherwise, special casing in multiple cases may be the only option. Or maybe all we need is a way to spell "any iterable of strings except a string", though that would only help type checking. But either way, it would be great to support this better. -CHB

On Fri, Jul 29, 2016, 2:18 PM Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
The problem here is that often folks want to NOT allow single strings, but rather want only an iterable of strings.
The module author might, but sometimes the user might want to use a str as an iterable of 1-len strs. It bugs me that Pandas forces me to wrap a str of indices in a list constructor before passing it into the DataFrame constructor.

I agree, the fact that a string consists of strings has bothered me as well. It makes some things work well but some recursive algorithms requires special checks. There should be a way to iterate over a string as characters. AtomicIterable seems a good idea, but maybe it could be just the conceptual starting point of building up a class hierarchy of "iterability", and such a class could be passed to the 'flatten' method proposed above in the thread by Nick Coghlan. Some conceptual inspiration may be taken from numpy where one can have 0-dimensional arrays which are different from scalars - and different from n-D array with 1 element (n > 0) ``` In [7]: x = np.ndarray(()) In [8]: x Out[8]: array(21.0) In [9]: type(x) Out[9]: numpy.ndarray In [10]: x[()] Out[10]: 21.0 In [11]: np.asscalar(x) Out[11]: 21.0 In [13]: x = np.ndarray((1,)) In [14]: x Out[14]: array([ 17.5]) ```

I like this proposal from a book-keeping perspective... the interface provided by sequences/mappings/sets is definitely different than the interface provided by generic iterables (mostly due to the fact that the former group can be iterated repeatedly). With that said, I'm not really sure what how the `Reiterable` (or whatever suitable name the smart people around here come up with) class would be implemented. class Reiterable(Iterable): pass This seems a bit weird. All of the other ABCs make some guarantees. If I fill out a set of methods then my object will be a <fill in ABC>. There doesn't seem to be a set of methods that exist to distinguish between an object that is Iterable and an object that is Reiterable. I suppose that the counter argument is that we always assume that the methods on the object are implemented "reasonably". Perhaps the main goal for this class is just book-keeping/annotating/loose "if you say so" typechecking and we're trusting that if someone subclasses `Reiterable`, they're going to implement it in such a way that it indeed can be iterated multiple times and that they didn't just mean `Iterable`. If that's the intent, I think that is also fine with me -- it just seemed like someone ought to mention it once... On Thu, Jul 21, 2016 at 5:29 PM, Guido van Rossum <guido@python.org> wrote:
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!

On Thu, Jul 21, 2016 at 01:43:01PM -0700, Guido van Rossum wrote:
Are you talking about giving a nice name to Union[Set, Sequence, Mapping]? I'm not sure that there is one, unless we make up a word. How do you feel about abbreviations? SSM? MSS?
Now I'm having flash-backs to this thread: https://mail.python.org/pipermail/python-ideas/2013-September/023239.html As I recall from that thread: isinstance(obj, Reiterable) is *almost* equivalent to: isinstance(obj, Iterable) and not isinstance(obj, Iterator) Back at the time, I wasn't convinced that the stdlib needed a special name for this concept, but I wasn't thinking of the typing module. -- Steve

On Thursday, July 21, 2016 at 1:44:11 PM UTC-7, Guido van Rossum wrote:
+1 on this, from the pytype side of things. pytype, during type-inference, always struggles to give names to things, and having additional structural types around helps. Also, I like "Collection" better than Reiterable, since the latter sounds too intricate, possibly scaring users away.
participants (20)
-
Alexander Heger
-
Bar Harel
-
Brett Cannon
-
Chris Angelico
-
Chris Barker - NOAA Federal
-
Daniel Moisset
-
David Mertz
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Guido van Rossum
-
Matt Gilson
-
Matthias Kramm
-
Michael Selik
-
Nick Coghlan
-
Ryan Gonzalez
-
Serhiy Storchaka
-
Stephan Hoyer
-
Steven D'Aprano
-
Thomas Nyberg