Syncing up iterators with gaps

Wed Sep 28 16:17:20 EDT 2016

Here is a slight variation of Chris A's code that does not require
more than a single look-ahead per generator. It may be better
depending on the exact data passed in.

Chris A's version will store all of the items for each output that
have a matching key, which, depending on the expected data, could use
quite a bit of memory. This version yields a list of generators, which
then allows for never having more than a single lookahead per list.
The generators returned must be consumed immediately or they will be
emptied - I put in a safety loop that consumes them before continuing
processing.

My version is likely better if your processing does not require
storing (most of) the items and you expect there to be a large number
of common keys in each iterator. If you expect only a couple of items
per shared key per list, Chris A's version will probably perform
better for slightly more memory usage, as well as being somewhat safer
and simpler.

def magic_happens_here(*iters):
    def gen(j):
        while nexts[j][0] == common_key:
            yield nexts[j]
            nexts[j] = next(iters[j], (None,))
    iters = [iter(it) for it in iters]
    nexts = [next(it, (None,)) for it in iters]
    while "moar stuff":
        try: common_key = min(row[0] for row in nexts if row[0])
        except ValueError: break # No moar stuff
        outputs = [common_key]
        for i in range(len(nexts)): # code smell, sorry
            outputs.append(gen(i))
        yield outputs
        # The following three lines confirm that the generators provided
        #  were consumed. This allows not exhausting the yielded generators.
        #  If this is not included, and the iterator is not consumed, it can
        #  result in an infinite loop.
        for output in outputs[1:]:
            for item in output:
                pass
Chris

On Wed, Sep 28, 2016 at 12:48 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Sep 29, 2016 at 5:10 AM, Tim Chase
> <python.list at tim.thechases.com> wrote:
>> And I'd like to do something like
>>
>>   for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3):
>>     for row in d1:
>>       process_a(common_key, row)
>>     for thing in d2:
>>       process_b(common_key, row)
>>     for thing in d3:
>>       process_c(common_key, row)
>
> Assuming that the keys are totally ordered and the data sets are
> sorted, this should work:
>
> def magic_happens_here(*iters):
>     iters = [iter(it) for it in iters]
>     nexts = [next(it, (None,)) for it in iters]
>     while "moar stuff":
>         try: common_key = min(row[0] for row in nexts if row[0])
>         except ValueError: break # No moar stuff
>         outputs = [common_key]
>         for i in range(len(nexts)): # code smell, sorry
>             output = []
>             while nexts[i][0] == common_key:
>                 output.append(nexts[i])
>                 nexts[i] = next(iters[i], (None,))
>             outputs.append(output)
>         yield outputs
>
> Basically, it takes the lowest available key, then takes everything of
> that key and yields it as a unit.
>
> Code not tested. Use at your own risk.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list