Is there a canonical way to check whether an iterable is ordered?

Terry Reedy tjreedy at udel.edu
Sat Sep 20 00:02:58 CEST 2014


On 9/18/2014 10:45 PM, Chris Angelico wrote:
> On Fri, Sep 19, 2014 at 9:52 AM, Roy Smith <roy at panix.com> wrote:
>> In article <mailman.14103.1411047208.18130.python-list at python.org>,
>>   Chris Angelico <rosuav at gmail.com> wrote:
>>
>>> The one thing you can rely on (and therefore must comply with, when
>>> you design an iterable) is that iteration will hit every element

in what, a collection that exists separately from the iteration?  or a 
collection generated by the iteration?

This is actually two statements in one -- what the user can rely on, 
which I claim is very little (see below) -- and what the designer must 
'comply with', which is correspondingly very little (again see below) 
except as the designer claims otherwise in specific documentation.

>>> exactly once.

There are only two things to rely on without further, special case, 
information.

iter(iterable) = iterator (if not, 'iterable' is not an iterable)

next(iterator) = object or raises StopIteration (with raising not 
guaranteed)

The conceptual flaw in the statement above, as I think originally 
intended, is the notion that 'every element [of a collection]' exists, 
or is defined before the iteration.  Iterators can be creative, in the 
sense of generating collections as they go, in a manner not determined 
by the past.  It is also possible that the next item depends on what the 
iterator user has done with previous items.  Such feedback loops are 
common for programs that interact with users.

If the statement above is interpreted retrospectively, as "iteration 
hits every item of the sequence it generates", then it is trivially true 
and is equivalent to "iteration yields every element that it yields, in 
the order that it yields them".  To paraphase the OP's question, given 
completion of

yielded1 = [item for item in iterable]
yielded2 = [item for item in yielded1]

then yielded2 == yielded1 *is* guarenteed.  But this is all that is 
guaranteed.

Itertools.tee operates similarly to the above code except that 
processing of the duplicate streams can be interleaved.  There is an 
internal queue of items yielded on one branch not the other.  For best 
operation, accesses should be interleaved, with the maximum lag bounded. 
  Itertools.tee guarantees that the two branches yield the same items in 
the same order.

>> Does it actually say that somewhere?  For example:
>>
>> for i in bag.pick_randomly_with_replacement(n=5):
>>     print i
>>
>> shouldn't do that.
>
> When you pick randomly from a population, you create a new population,
> which may have duplicates compared to the original. (For efficiency's
> sake it probably won't all actually exist immediately, but
> conceptually it does exist.)

When the next item depends on feedback from the user, that conceptual 
trick does not work as well.

 > That's what you're iterating over - not the bag itself.

If one iterates over anything other that a sequence, in forward order, 
then one is, in effect, iterating over a new sequence generated from the 
'base' collection.  In particular, set and dict iteration iterates over 
an arbitrary serialization of the set or dict.

In the example above, the underlying collection (or specification 
thereof) and size of selection can be hidden away in a class instance so 
that the code may just look like 'for i in my_iterable:", just as for 
iterating over a set.

-- 
Terry Jan Reedy




More information about the Python-list mailing list