Set starting point for itertools.product()

Hi, My idea is to set the starting point for itertools.product() <https://docs.python.org/3.6/library/itertools.html#itertools.product> since it becomes very slow if the point of interest is in the middle. For example when working with datetime tuples with seconds resolution (worst case, milli/microseconds), you need to skip a lot of items. There are several methods I have tried as an alternative but itertools.product() is the fastest: - datetime and timedelta - naive looping This is where I applied it but the issue at seconds resolution is still a bottleneck: https://github.com/Code-ReaQtor/DocCron/blob/1.0.0/doccron/job.py#L171 Setting the starting point might come in handy if we want to skip several data. What do you think? Best Regards, Ronie

On Thu, Oct 25, 2018 at 11:47:18AM +0800, Ronie Martinez wrote:
I don't understand what you mean by "skip a lot of items" or why this applies to datetime tuples. Can you give a SHORT and SIMPLE example, showing both the existing solution and your proposed solution? -- Steve

Hi Steve, Here is an example: import itertools import time def main(): datetime_odometer = itertools.product( range(2018, 10_000), # year range(1, 13), # month range(1, 31), # days range(0, 24), # hours range(0, 60), # minutes range(0, 60) # seconds ) datetime_of_interest = (2050, 6, 15, 10, 5, 0) for i in datetime_odometer: if i == datetime_of_interest: # target start time break if __name__ == '__main__': start = time.time() main() duration = time.time() - start print(duration, 'seconds') # 91.9426908493042 seconds It took 92 seconds to get to the target start time. It does not only apply to datetimes but for other purposes that uses "odometer-like" patterns. I don't have any propose solution for now, but I guess adding this feature within itertools will come in handy. Regards, Ronie On Thu, Oct 25, 2018 at 1:49 PM Steven D'Aprano <steve@pearwood.info> wrote:

Ronie Martinez writes:
I don't understand the issue. Doesn't def make_odometer(year, month, day, hour, minute, second): return itertools.product( range(year, 10_000), range(month, 13), range(day, 31), range(hour, 24), range(minute, 60), range(second, 61) # leap seconds! ) def main(): datetime_of_interest = (2050, 6, 15, 10, 5, 0) datetime_odometer = make_odometer(*datetime_of_interest) do what you want? If you have a task where that *doesn't* do what's needed, eg, where the "odometer wheels" are an iterated function system, I don't see any way to avoid the problem. If you have some range-like object that doesn't support starting values, you need to fix that or there's nothing that could be done about it in itertools. (Yet Another) Steve -- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

Hi another Steve, Your code will not work! You are skipping the other values below it. When, let's say, the minute value reaches 59, the next iterator should return 0. In your code, it will not. It will start with 5. That is not how an odometer wheel works. Best Regards, Ronie On Thu, Oct 25, 2018 at 2:59 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

I see three issues emerging from Ronie's post. The first issue. Each combinatorial iterator, such as itertools.product, has an internal state. This internal state, roughly, is 'where am I'? For the product of three tuples, the internal state is the value of (i, j, k), the indexes into the tuples. The first issue is: how to set the internal state of a combinatorial iterator? And should this be added to the standard library? The second issue. Again, it's about combinatorial iterators. Here's an example. By using enumerate(combinations('ABCD', 2)) we set up a bijection between range(0, 4 * 3 // 2) and the length two subsets of set('ABCD'). When and how should this bijection be made available. The third issue is solving the specific date and time problem he provides. Because of the complexities of date and time, this is I think best done using the datetime module. For me, something like the below (not tested) is the obvious thing to try. def my_moments(start, step): i = 0 while True: value = start + i * step yield value i = i + 1 Here, start is to be a datetime, and step a timedelta. And instead of yielding the value as is, convert it if you wish to a tuple of (year, month, day, hour, minute, second). Ronie. Does this solve your problem? And if not, why not? -- Jonathan

25.10.18 09:31, Ronie Martinez пише:
Thank you for clarification. Now I understand your idea. For datetimes it is better to use the datetime classes: def iterdatetimes(): delta = timedelta(microseconds=1) dt = datetime(2050,6,15,10,5,0) while True: yield dt dt += delta Note that in your example you missed 31th days, but iterate 29th and 30th February. See also the calendar module which provides date range iterators (although not with microsecond precision). Currently for general "odometer-like" patterns you can use the undocumented __setstate__ method of itertools.product. But this is on your own risk.

Hi Serhiy, I missed the 31st on days range. Thanks for spotting it. I do verify if the datetime tuple is correct by passing them to datetime() and raising exception later on on the code (see https://github.com/Code-ReaQtor/DocCron/blob/1.0.0/doccron/job.py#L180) Datetime and timedeltas can solve the problem if there is a "pattern" but this is not an efficient way. For example, if there are steps on all the categories or the items have no pattern: datetime_odometer = itertools.product( range(2018, 10_000, 5), # year with steps range(1, 13, 3), # month with steps range(1, 32, 4), # days with steps [0, 5, 6, 10, 13, 24], # hours without steps range(0, 60, 6), # minutes with steps range(0, 60, 2) # seconds with steps ) Datetime and timedelta will create a lot of overhead and is not the best solution. I still believe itertools.product() is the fastest and best solution. Let me read the code for __setstate__ first. Thanks for spotting this! Best Regards, Ronie Martinez On Thu, Oct 25, 2018 at 6:22 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

Hi Serhiy, __setstate__() made it possible to specify the starting point. Using this method and combining it with binning (if the items does not have any pattern) worked well! Thanks for the suggestion. Cheers! Best Regards, Ronie On Fri, Oct 26, 2018 at 10:31 AM Ronie Martinez <ronmarti18@gmail.com> wrote:

On Thu, Oct 25, 2018 at 02:31:05PM +0800, Ronie Martinez wrote:
When you talked about datetime, I thought you meant actual datetime objects. The above is buggy: it ignores the 31st day of January, etc, but includes February 29 and 30 every year.
In the most general case, there is no way to jump into the middle of an arbitrary iterator, except to start at the beginning and compute the values until you see the one that you want. Arbitrary iterators compute their values on request, and there is no way to jump ahead except by inspecting each value in turn, skipping the ones you don't want. So unless I have missed something, I think what you are asking for is impossible except for special cases like lists. But in *this* case, modelling an odometer, that special case works: def rotate(iterable, position): L = list(iterable) return L[position:] + L[:position] Which gives us this: py> months = rotate(range(1, 13), 5) py> months [6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5] Now pass that to itertools.product. In other words, I think that the right solution here is to construct your iterables to start at the position you want, rather than expect product() to jump into the middle of the sequence. (Which may not be possible.) -- Steve

Hi Steve, I tried this when writing DocCron <https://github.com/Code-ReaQtor/DocCron>but it will not work correctly. Suppose we have minutes with range 0-59 and seconds with range 0-59 but starting at, say, 5: When the seconds value reaches 59, on the next iteration the minute value should increase by 1 (or back to zero) but it will not work that way since the next rotation will be at 4 transitioning to 5. So the correct solution is not modifying any of the arrangements. Best Regards, Ronie On Thu, Oct 25, 2018 at 7:31 PM Steven D'Aprano <steve@pearwood.info> wrote:

On Thu, Oct 25, 2018 at 11:47:18AM +0800, Ronie Martinez wrote:
I don't understand what you mean by "skip a lot of items" or why this applies to datetime tuples. Can you give a SHORT and SIMPLE example, showing both the existing solution and your proposed solution? -- Steve

Hi Steve, Here is an example: import itertools import time def main(): datetime_odometer = itertools.product( range(2018, 10_000), # year range(1, 13), # month range(1, 31), # days range(0, 24), # hours range(0, 60), # minutes range(0, 60) # seconds ) datetime_of_interest = (2050, 6, 15, 10, 5, 0) for i in datetime_odometer: if i == datetime_of_interest: # target start time break if __name__ == '__main__': start = time.time() main() duration = time.time() - start print(duration, 'seconds') # 91.9426908493042 seconds It took 92 seconds to get to the target start time. It does not only apply to datetimes but for other purposes that uses "odometer-like" patterns. I don't have any propose solution for now, but I guess adding this feature within itertools will come in handy. Regards, Ronie On Thu, Oct 25, 2018 at 1:49 PM Steven D'Aprano <steve@pearwood.info> wrote:

Ronie Martinez writes:
I don't understand the issue. Doesn't def make_odometer(year, month, day, hour, minute, second): return itertools.product( range(year, 10_000), range(month, 13), range(day, 31), range(hour, 24), range(minute, 60), range(second, 61) # leap seconds! ) def main(): datetime_of_interest = (2050, 6, 15, 10, 5, 0) datetime_odometer = make_odometer(*datetime_of_interest) do what you want? If you have a task where that *doesn't* do what's needed, eg, where the "odometer wheels" are an iterated function system, I don't see any way to avoid the problem. If you have some range-like object that doesn't support starting values, you need to fix that or there's nothing that could be done about it in itertools. (Yet Another) Steve -- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

Hi another Steve, Your code will not work! You are skipping the other values below it. When, let's say, the minute value reaches 59, the next iterator should return 0. In your code, it will not. It will start with 5. That is not how an odometer wheel works. Best Regards, Ronie On Thu, Oct 25, 2018 at 2:59 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

I see three issues emerging from Ronie's post. The first issue. Each combinatorial iterator, such as itertools.product, has an internal state. This internal state, roughly, is 'where am I'? For the product of three tuples, the internal state is the value of (i, j, k), the indexes into the tuples. The first issue is: how to set the internal state of a combinatorial iterator? And should this be added to the standard library? The second issue. Again, it's about combinatorial iterators. Here's an example. By using enumerate(combinations('ABCD', 2)) we set up a bijection between range(0, 4 * 3 // 2) and the length two subsets of set('ABCD'). When and how should this bijection be made available. The third issue is solving the specific date and time problem he provides. Because of the complexities of date and time, this is I think best done using the datetime module. For me, something like the below (not tested) is the obvious thing to try. def my_moments(start, step): i = 0 while True: value = start + i * step yield value i = i + 1 Here, start is to be a datetime, and step a timedelta. And instead of yielding the value as is, convert it if you wish to a tuple of (year, month, day, hour, minute, second). Ronie. Does this solve your problem? And if not, why not? -- Jonathan

25.10.18 09:31, Ronie Martinez пише:
Thank you for clarification. Now I understand your idea. For datetimes it is better to use the datetime classes: def iterdatetimes(): delta = timedelta(microseconds=1) dt = datetime(2050,6,15,10,5,0) while True: yield dt dt += delta Note that in your example you missed 31th days, but iterate 29th and 30th February. See also the calendar module which provides date range iterators (although not with microsecond precision). Currently for general "odometer-like" patterns you can use the undocumented __setstate__ method of itertools.product. But this is on your own risk.

Hi Serhiy, I missed the 31st on days range. Thanks for spotting it. I do verify if the datetime tuple is correct by passing them to datetime() and raising exception later on on the code (see https://github.com/Code-ReaQtor/DocCron/blob/1.0.0/doccron/job.py#L180) Datetime and timedeltas can solve the problem if there is a "pattern" but this is not an efficient way. For example, if there are steps on all the categories or the items have no pattern: datetime_odometer = itertools.product( range(2018, 10_000, 5), # year with steps range(1, 13, 3), # month with steps range(1, 32, 4), # days with steps [0, 5, 6, 10, 13, 24], # hours without steps range(0, 60, 6), # minutes with steps range(0, 60, 2) # seconds with steps ) Datetime and timedelta will create a lot of overhead and is not the best solution. I still believe itertools.product() is the fastest and best solution. Let me read the code for __setstate__ first. Thanks for spotting this! Best Regards, Ronie Martinez On Thu, Oct 25, 2018 at 6:22 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

Hi Serhiy, __setstate__() made it possible to specify the starting point. Using this method and combining it with binning (if the items does not have any pattern) worked well! Thanks for the suggestion. Cheers! Best Regards, Ronie On Fri, Oct 26, 2018 at 10:31 AM Ronie Martinez <ronmarti18@gmail.com> wrote:

On Thu, Oct 25, 2018 at 02:31:05PM +0800, Ronie Martinez wrote:
When you talked about datetime, I thought you meant actual datetime objects. The above is buggy: it ignores the 31st day of January, etc, but includes February 29 and 30 every year.
In the most general case, there is no way to jump into the middle of an arbitrary iterator, except to start at the beginning and compute the values until you see the one that you want. Arbitrary iterators compute their values on request, and there is no way to jump ahead except by inspecting each value in turn, skipping the ones you don't want. So unless I have missed something, I think what you are asking for is impossible except for special cases like lists. But in *this* case, modelling an odometer, that special case works: def rotate(iterable, position): L = list(iterable) return L[position:] + L[:position] Which gives us this: py> months = rotate(range(1, 13), 5) py> months [6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5] Now pass that to itertools.product. In other words, I think that the right solution here is to construct your iterables to start at the position you want, rather than expect product() to jump into the middle of the sequence. (Which may not be possible.) -- Steve

Hi Steve, I tried this when writing DocCron <https://github.com/Code-ReaQtor/DocCron>but it will not work correctly. Suppose we have minutes with range 0-59 and seconds with range 0-59 but starting at, say, 5: When the seconds value reaches 59, on the next iteration the minute value should increase by 1 (or back to zero) but it will not work that way since the next rotation will be at 4 transitioning to 5. So the correct solution is not modifying any of the arrangements. Best Regards, Ronie On Thu, Oct 25, 2018 at 7:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
participants (5)
-
Jonathan Fine
-
Ronie Martinez
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano