zip_strict() or similar in itertools ?
Dear all, the itertools documentation has the grouper() recipe, which returns consecutive tuples of a specified length n from an iterable. To do this, it uses zip_longest(). While this is an elegant and fast solution, my problem is that I sometimes don't want my tuples to be filled with a fillvalue (which happens if len(iterable) % n != 0), but I would prefer an error instead. This is important, for example, when iterating over the contents of a file and you want to make sure that it's not truncated. I was wondering whether itertools, in addition to the built-in zip() and zip_longest(), shouldn't provide something like zip_strict(), which would raise an Error, if its arguments aren't of equal length. zip_strict() could then be used in an alternative grouper() recipe. By the way, right now, I am using the following workaround for this problem: def iblock(iterable, bsize, strict=False): """Return consecutive lists of bsize items from an iterable. If strict is True, raises a ValueError if the size of the last block in iterable is smaller than bsize. If strict is False, it returns the truncated list instead.""" it=iter(iterable) i=[it]*(bsize-1) while True: try: result=[next(it)] except StopIteration: # iterator exhausted, end the generator break for e in i: try: result.append(next(e)) except StopIteration: # iterator exhausted after returning at least one item, # but before returning bsize items if strict: raise ValueError("only %d value(s) left in iterator, expected %d" % (len(result),bsize)) else: pass yield result , which works well, but is about 3-4 times slower than the grouper() recipe. If you have alternative, faster solutions that I wasn't thinking of, I'd be very interested to here about them. Best, Wolfgang
Wolfgang Maier <wolfgang.maier@...> writes:
, which works well, but is about 3-4 times slower than the grouper() recipe. If you have alternative, faster solutions that I wasn't thinking of, I'd be very interested to here about them.
Best, Wolfgang
ok, I wasn't remembering the timing results correctly: it's about 8 times slower than grouper.
Hi, Have you tried using a marker as fill value and then look for it to raise the exception? The membership operator is quite decent, IIRC. Alfredo On Thu, Apr 4, 2013 at 12:42 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Wolfgang Maier <wolfgang.maier@...> writes:
, which works well, but is about 3-4 times slower than the grouper() recipe. If you have alternative, faster solutions that I wasn't thinking of, I'd be very interested to here about them.
Best, Wolfgang
ok, I wasn't remembering the timing results correctly: it's about 8 times slower than grouper.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Alfredo Solano Martínez <asolano@...> writes:
Hi,
Have you tried using a marker as fill value and then look for it to raise the exception? The membership operator is quite decent, IIRC.
Alfredo
Sure, that would be the alternative, but it's not a very general solution since you would have to figure out a fill marker that can never be part of the specific iterable. What's worse is that you're retrieving several elements per iteration, and those different elements may have different properties requiring different markers. For example, in a file every first line might be an arbitrary string, every second a number, every third could optionally be blank, and so on. So I guess, catching the problem early and raising an error right then, is a simpler and clearer solution. Wolfgang
Sure, that would be the alternative, but it's not a very general solution since you would have to figure out a fill marker that can never be part of the specific iterable.
What's worse is that you're retrieving several elements per iteration, and those different elements may have different properties requiring different markers. For example, in a file every first line might be an arbitrary string, every second a number, every third could optionally be blank, and so on. So I guess, catching the problem early and raising an error right then, is a simpler and clearer solution.
Wolfgang
Indeed, the question is still open. I was talking about the speed penalty of your interim solution. About the selection of a marker, what about a custom class? # None of your data will be this class Marker(): pass # Same as the docs recipes def grouper(n, iterable, fillvalue=None): args = [iter(iterable)] * n return itertools.zip_longest(*args, fillvalue=fillvalue) # And then do something like for t in grouper(3, 'ABCDEFG', Marker): if Marker in t: print('Marker) # or raise ValueError, ... Alfredo
Alfredo Solano Martínez <asolano@...> writes:
# None of your data will be this class Marker(): pass
# Same as the docs recipes def grouper(n, iterable, fillvalue=None): args = [iter(iterable)] * n return itertools.zip_longest(*args, fillvalue=fillvalue)
# And then do something like for t in grouper(3, 'ABCDEFG', Marker): if Marker in t: print('Marker) # or raise ValueError, ...
Alfredo
Thanks for sharing this! It's the same basic idea as in Peter's strict_grouper solution, which integrates the whole thing in one function. Wolfgang
Wolfgang Maier wrote:
Dear all, the itertools documentation has the grouper() recipe, which returns consecutive tuples of a specified length n from an iterable. To do this, it uses zip_longest(). While this is an elegant and fast solution, my problem is that I sometimes don't want my tuples to be filled with a fillvalue (which happens if len(iterable) % n != 0), but I would prefer an error instead. This is important, for example, when iterating over the contents of a file and you want to make sure that it's not truncated. I was wondering whether itertools, in addition to the built-in zip() and zip_longest(), shouldn't provide something like zip_strict(), which would raise an Error, if its arguments aren't of equal length. zip_strict() could then be used in an alternative grouper() recipe.
By the way, right now, I am using the following workaround for this problem:
def iblock(iterable, bsize, strict=False): """Return consecutive lists of bsize items from an iterable.
If strict is True, raises a ValueError if the size of the last block in iterable is smaller than bsize. If strict is False, it returns the truncated list instead."""
it=iter(iterable) i=[it]*(bsize-1) while True: try: result=[next(it)] except StopIteration: # iterator exhausted, end the generator break for e in i: try: result.append(next(e)) except StopIteration: # iterator exhausted after returning at least one item, # but before returning bsize items if strict: raise ValueError("only %d value(s) left in iterator, expected %d" % (len(result),bsize)) else: pass yield result
, which works well, but is about 3-4 times slower than the grouper() recipe. If you have alternative, faster solutions that I wasn't thinking of, I'd be very interested to here about them.
Best, Wolfgang
A simple approach is def strict_grouper(items, size, strict): fillvalue = object() args = [iter(items)]*size chunks = zip_longest(*args, fillvalue=fillvalue) prev = next(chunks) for chunk in chunks: yield prev prev = chunk if prev[-1] is fillvalue: if strict: raise ValueError else: prev = prev[:prev.index(fillvalue)] yield prev If that's fast enough it might be a candidate for the recipes section. A partial solution I wrote a while a go is http://code.activestate.com/recipes/497006-zip_exc-a-lazy-zip-that-ensures-t...
Peter Otten <__peter__@...> writes:
Peter Otten wrote:
prev = prev[:prev.index(fillvalue)]
To be bullet-proof that needs to check object identity instead of equality:
while prev[-1] is fillvalue: prev = prev[:-1]
That's a clever way!! Thanks, I'll try that. Wolfgang
Wolfgang Maier <wolfgang.maier@...> writes: Turns out that Peter's solution (using a class instance as the marker, and managing to get away with a test for it only once after exhaustion of the iterator) is impressively fast indeed: def strict_grouper(items, size, strict): fillvalue = object() args = [iter(items)]*size chunks = zip_longest(*args, fillvalue=fillvalue) prev = next(chunks) for chunk in chunks: print (prev) yield prev prev = chunk if prev[-1] is fillvalue: if strict: raise ValueError else: while prev[-1] is fillvalue: prev = prev[:-1] yield prev beats my old, clumsy approach by a speed factor of ~5, i.e., it's less than a factor 2 slower than the grouper() recipe, but raises the error I wanted! Certainly good enough for me, and, yes, I think it would make a nice itertools recipe. Thanks for your help, Wolfgang
participants (3)
-
Alfredo Solano Martínez
-
Peter Otten
-
Wolfgang Maier