On Thu, May 07, 2020 at 11:12:28PM +0900, Stephen J. Turnbull wrote: [...]
So of course zip can be said to be used like a class. Its instances are constructed by calling it, the most common (only in practice?) method call is .__iter__, it's invariably called implicitly in a for statement, and usually the instance is ephemeral (ie, discarded when the for statement is exited). But it needn't be. Like any iterator, if it's not exhausted in one iteration context, you can continue it later. Or explicitly call next on it.
Agreed. In CPython 3.x, zip is a class. In 2.x it was a function returning a list. I don't know why it's a class now -- possibly something to do with the C implementation? -- but that's not part of the public API, and I would not expect that to be a language guarantee. The API is only that it returns an iterator, it doesn't have to be any specific class. If zip were implemented in pure Python, it would probably be a generator, something like this: def zip(a, b): while True: yield(next(a), next(b)) only better :-) [...]
So while yes, alternate constructors are a common pattern, I don't think they are a common pattern for classes like zip.
That's a matter of programming style, I think. There's no real difference between
zip(a, b, length='checksame')
and
zip.checksame(a, b)
They just initialize an internal attribute differently, which may as well be an Enum or even a few named class constants.
I agree that it's a matter of programming style, but I disagree with your reason. It isn't necessary for the methods to return instances of the same class, they could also return different classes. An old example of this from Python 2 was int/long unification: py> type(int(1e1)) <type 'int'> py> type(int(1e100)) <type 'long'> A current example of this is the open() built-in, which returns a different class depending on the arguments given. For example: py> open('/tmp/a', 'wb') <_io.BufferedWriter name='/tmp/a'> py> open('/tmp/a', 'w') <_io.TextIOWrapper name='/tmp/a' mode='w' encoding='UTF-8'> So we might have: zip() --> return a zip instance zip.checksame() --> return a zip_strict instance for example. The specific type of iterator is an implementation detail.
I think (after the initial shock ;-) I like the latter *better*, because the semantics of checksame and longest are significantly (to me, anyway) different. checksame is a constraint on correct behavior, and explicitly elicits an Exception on invalid input. longest is a variant specification of behavior on a certain class of valid input. I'm happier with those being different *functions* rather than values of an argument. YMMV, of course.
If we reach consensus that this functionality is worth having and worth being in the builtins, my preferences would go (best to worst): (1) Namespaces are one honking great idea -- let's do more of those! zip is a namespace, let's use that fact: zip(*args) # for backwards compatibility zip.strict(*args) This gives us the best flexibility going into the future. We can add new versions of zip without overloading the builtins itself: there is only one top level name, and the docstring can point the reader at dir(zip) to see more. At some point, we might choose to move zip_longest into zip as well, leaving the itertools version for backwards compatibility. Maybe some day Soni will even get his desired version of zip that exposes any partial results left over after an iterator is exhausted. This idiom is particularly convenient since zip is a class and can easily be given additional methods, so they would show up in help(zip) without any extra work. (2) Separate top-level functions: zip, zip_strict (3) A mode parameter: zip(*args, mode='short') # default zip(*args, mode='strict') and then a long, long, long way down my list of preferred APIs: (4) A bool flag: zip(*args, strict=False) # default zip(*args, strict=True) which is the least flexible, since it locks us in to only two such zip versions without going into API contortions. -- Steven