On Mon, 4 May 2020 at 12:41, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 03, 2020 at 11:13:58PM -0400, David Mertz wrote:
It seems to me that a Python implementation of zip_equals() shouldn't do the check in a loop like a version shows (I guess from more-itertools). More obvious is the following, and this has only a small constant speed penalty.
def zip_equal(*its): yield from zip(*its) if any(_sentinel == next(o, _sentinel) for o in its): raise ZipLengthError
Alas, that doesn't work, even with your correction of `any` to `not all`.
py> list(zip_equal("abc", "xy")) [('a', 'x'), ('b', 'y')]
The problem here is that zip consumes the "c" from the first iterator, exhausting it, so your check at the end finds that all the iterators are exhausted.
This got me thinking, what if we were to wrap (or as it turned out, `chain` on to the end of) each of the individual iterables instead, thereby performing the relevant check before `zip` fully exhausted them, something like the following: ```python def zip_equal(*iterables): return zip(*_checked_simultaneous_exhaustion(*iterables)) def _checked_simultaneous_exhaustion(*iterables): if len(iterables) <= 1: return iterables def check_others(): # first iterable exhausted, check the others are too sentinel=object() if any(next(i, sentinel) is not sentinel for i in iterators): raise ValueError('unequal length iterables') if False: yield def throw(): # one of iterables[1:] exhausted first, therefore it must be shorter raise ValueError('unequal length iterables') if False: yield iterators = tuple(map(iter, iterables[1:])) return ( itertools.chain(iterables[0], check_others()), *(itertools.chain(it, throw()) for it in iterators), ) ``` This has the advantage that, if desired, the `_checked_simultaneous_exhaustion` function could also be reused to implement a previously mentioned length checking version of `map`. Going further, if `checked_simultaneous_exhaustion` were to become a public function (with a better name), it could be used to impose same-length checking to the iterable arguments of any function, providing those iterables are consumed in a compatible way. Additionally, it would allow one to be specific about which iterables were checked, rather than being forced into the option of checking either all or none by `zip_equal` / `zip` respectively, thus allowing us to have our cake and eat it in terms of mixing infinite and checked-length finite iterables, e.g. ```python zip(i_am_infinite, *checked_simultaneous_exhaustion(*but_we_are_finite)) # or, if they aren't contiguous checked1, checked2 = checked_simultaneous_exhaustion(it1, it2) zip(checked1, infinite, checked2) ``` However, as I previously alluded to, this relies upon the assumption that each of the given iterators is advanced in turn, in the order they were provided to `checked_simultaneous_exhaustion`. So -- while this function would be suitable for use with `zip`, `map`, and any others which do the same -- if we wanted a more general `checked_equal_length` function that extended to cases in which the iterable-consuming function may consume the iterables in some haphazard order, we'd need something more involved, such as keeping a running tally of the current length of each iterable and, even then, we could still only guarantee raising on unequal lengths if the said function advanced all the given iterators by at least the length of the shortest.