[issue18032] set methods should specify whether they consume iterators "lazily"

New submission from Abafei: It says here (http://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset) that some of the set methods take iterables as a parameter. Usually, the expected behavior is for a iterator consumer to consume only as much data as it needs. For example, for the code `any(itertools.repeat(True))`, the code will complete speedily, in contrast to when all() is used instead of any(), in which case the code will go forever. A least some of the set methods have semantics such that they can consume only some of the input; for example, "issubset" only needs to make sure that all of the items in the set are in the iterable, and once this is the case, it can return "True". However in such a case, the methods will *still* go forever. The docs should specify that this is the case, to disambiguate the semantics. (Tested on Python 3.2.3 and 2.7.3). ---------- assignee: docs@python components: Documentation messages: 189806 nosy: Abafei, docs@python priority: normal severity: normal status: open title: set methods should specify whether they consume iterators "lazily" type: behavior versions: Python 2.7, Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________

Changes by Phil Connell <pconnell@gmail.com>: ---------- nosy: +pconnell _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________

Changes by Ezio Melotti <ezio.melotti@gmail.com>: ---------- keywords: +easy nosy: +ezio.melotti stage: -> needs patch versions: +Python 3.3, Python 3.4 -Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________

Changes by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- assignee: docs@python -> rhettinger nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________

Raymond Hettinger added the comment: We don't normally document implementation details or the presences or absence of internal optimizations. This gives us freedom to change the implementation and it allows freedom for other implementations (such as Jython, IronPython, and PyPy) to make their own choices. Often those implementations start out with the simplest correct approach and then become increasingly optimized over time. I'm leaving this one open as a possible CPythhon performance enhancement, adding early-out behavior to issubset() when the argument is a non-set, non-dict iterable: def issubset(self, iterable): n = len(self) seen = set() for x in iterable: if x not in seen and x in self: seen.add(x) n -= 1 if not n: return True # early-out return False ---------- components: +Interpreter Core -Documentation type: behavior -> performance versions: -Python 2.7, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________

Dustin Haffner added the comment: I've made an attempt at patching set_issubset() to match the Python from Raymond's message. I've rearranged the set_issubset function so that it checks for a set/frozenset, dict, or iterable in that order. In the iterable case it will create a temporary set and add elements to it as it consumes them from the iterable, returning early when possible. This is my first patch, please let me know how I may improve it! ---------- keywords: +patch nosy: +dhaffner Added file: http://bugs.python.org/file32706/issubset_improvement.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18032> _______________________________________
participants (5)
-
Abafei
-
Dustin Haffner
-
Ezio Melotti
-
Phil Connell
-
Raymond Hettinger