Syncing up iterators with gaps
Chris Kaynor
ckaynor at zindagigames.com
Wed Sep 28 16:17:20 EDT 2016
Here is a slight variation of Chris A's code that does not require
more than a single look-ahead per generator. It may be better
depending on the exact data passed in.
Chris A's version will store all of the items for each output that
have a matching key, which, depending on the expected data, could use
quite a bit of memory. This version yields a list of generators, which
then allows for never having more than a single lookahead per list.
The generators returned must be consumed immediately or they will be
emptied - I put in a safety loop that consumes them before continuing
processing.
My version is likely better if your processing does not require
storing (most of) the items and you expect there to be a large number
of common keys in each iterator. If you expect only a couple of items
per shared key per list, Chris A's version will probably perform
better for slightly more memory usage, as well as being somewhat safer
and simpler.
def magic_happens_here(*iters):
def gen(j):
while nexts[j][0] == common_key:
yield nexts[j]
nexts[j] = next(iters[j], (None,))
iters = [iter(it) for it in iters]
nexts = [next(it, (None,)) for it in iters]
while "moar stuff":
try: common_key = min(row[0] for row in nexts if row[0])
except ValueError: break # No moar stuff
outputs = [common_key]
for i in range(len(nexts)): # code smell, sorry
outputs.append(gen(i))
yield outputs
# The following three lines confirm that the generators provided
# were consumed. This allows not exhausting the yielded generators.
# If this is not included, and the iterator is not consumed, it can
# result in an infinite loop.
for output in outputs[1:]:
for item in output:
pass
Chris
On Wed, Sep 28, 2016 at 12:48 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Sep 29, 2016 at 5:10 AM, Tim Chase
> <python.list at tim.thechases.com> wrote:
>> And I'd like to do something like
>>
>> for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3):
>> for row in d1:
>> process_a(common_key, row)
>> for thing in d2:
>> process_b(common_key, row)
>> for thing in d3:
>> process_c(common_key, row)
>
> Assuming that the keys are totally ordered and the data sets are
> sorted, this should work:
>
> def magic_happens_here(*iters):
> iters = [iter(it) for it in iters]
> nexts = [next(it, (None,)) for it in iters]
> while "moar stuff":
> try: common_key = min(row[0] for row in nexts if row[0])
> except ValueError: break # No moar stuff
> outputs = [common_key]
> for i in range(len(nexts)): # code smell, sorry
> output = []
> while nexts[i][0] == common_key:
> output.append(nexts[i])
> nexts[i] = next(iters[i], (None,))
> outputs.append(output)
> yield outputs
>
> Basically, it takes the lowest available key, then takes everything of
> that key and yields it as a unit.
>
> Code not tested. Use at your own risk.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list