
Hi Arnaud Arnaud Delobelle wrote:
I have now worked out the details, and it is indeed possible to get O(1) for simple cases and amortized O(logN) in general, all with fairly low constants.
I'm sorry if I'm missing something obvious, but there are two things I can't work out: I am glad you asked. Reading it again, I can see that this is definitely not obvious.
* What you are measuring the time complexity of.
The time for a single 'next', 'send', 'throw' or 'close' call to a generator or a single "yield from" expression, excluding the time spent running the user-defined code in it. (Does that make sense?) In other words, the total overhead of finding the actual user code to run and handling the return values/exceptions according to the PEP.
* What N stands for.
I suspect that N is the 'delegation depth': the number of yield-from that have to be gone through. I imagine that you are measuring the time it takes to get the next element in the generator. These are guesses - can you correct me? N is the total number of suspended generators in the tree(s) of generators involved in the operation. Remember that it is possible to have multiple generators yield from the same 'parent' generator. The 'delegation depth' would be the height of that tree.
One interesting thing to note is that all non-contrived examples I have seen only build simple chains of iterators, and only do that by adding one at a time in the deepest nested one. This is the best possible case for my algorithm. If you stick to that, the time per operation is O(1). If we decided to only allow that, the algorithm can be simplified significantly.
I have implemented the tree structure as a python module and added a trampoline-based pure-python implementation of "yield-from" to try it out.
It seems that this version beats a normal "for v in it: yield v" when the delegation chains get around 90 generators deep.
Can you give an example?
Sure, here is the simple code I used for timing the 'next' call import itertools, time from yieldfrom import uses_from, from_ # my module... @uses_from def child(it): yield from_(it) def child2(it): for i in it: yield i def longchain(N): it = itertools.count() for i in xrange(N): it = child(it) # replace this with child2 to test the current # "for v in it: yield v" pattern. it.next() # we are timing the 'next' calls (not the setup of # the chain) so skip the setup by calling next once. return it it = longchain(90) times = [] for i in xrange(10): t1 = time.time() for i in xrange(100000): it.next() t2 = time.time() times.append(t2-t1) print min(times) This version takes about the same time whether you use child or child2 to build the chain. However, the version using yield..from takes the same time no matter how long the chain is, while the time for the "for v in it: yield v" version is linear in the length of the chain, and so would lose big-time for longer chains. I hope this answers your questions. Regards Jacob