
... [Tim]
It's solving a different problem, though. In your "Return all unique combinations of elements in seq", by "unique" you really mean "by position", not " by value". ...
[Ron Adam]
Correct.
And sometimes an items position is part of what makes it unique. Multiple roller combination locks and slot-machines, come to mind. I'm sure there are other things where the label or face value is only one aspect of it's identity.
I'm wondering whether you read any of the thread this one spawned from? Yes, it was long and tedious ;-) But Python's itertools already has `permutations()` and `combinations()` functions that work by position. Those problems are already solved in the standard library. The entire point of that thread was to investigate "by value" functions instead.
for p in unique_sets('aab'): ... print(p) ['a'] ['a'] ['b'] ['b', 'a'] ['a', 'a'] ['b', 'a'] ['b', 'a', 'a']
See? ['a'] is produced twice, and so is ['b', 'a']. The whole point of the algorithms in the thread this spawned from was to avoid generating *value*-duplicates to begin with. Filtering them out later is too inefficient.
Well, we can filter them to begin-with instead of later.
for p in unique_sets(list(set('aaaaaaaa'))): print(p)
['a']
Yes, and that approach was already discussed at great length. It's semantically incorrect; e.g.,
for a in anagrams('aab', 2): ... print(a) ('a', 'a') ('a', 'b') ('b', 'a')
('a', 'a') is _expected_ output.
And skip combination duplicates later as they are produced.
for p in unique_sets('aaaaaaaa'): if cached(tuple(p)): continue print(p)
['a'] ['a', 'a'] ...
And that approach too was discussed at great length. It's too slow and too wasteful of space in general, For a time example,
for a in anagrams('A' * 1000 + 'B' * 1000, 2): ... print(a) ('A', 'A') ('A', 'B') ('B', 'A') ('B', 'B')
How long do you think unique_sets would take to do that? No, I can't count that high either ;-) For a space example, just pick one with _many_ distinct outputs. The cache must grow to hold all of them. All discussed before.
... Not generating them to begin with requires some sort of known ordering or pattern in the data.
It requires comparing the iterable's elements for equality. That's all. Where N = len(iterable), the algorithm I posted requires worst-case O(N) space and worst-case N-choose-2 item comparisons, regardless of how many "anagrams" are generated.
The difference between checking them inside the algorithm as they are produced, and checking them externally as they are *yielded* is only the yield overhead if we are talking about python code in both places.
The algorithm I posted was Python code, and can be astronomically more efficient than "checking them externally" (see the 'A' 1000 + 'B' * 1000 example above).
... I've been thinking that pythons byte code could be a little higher level and still be efficient. So I wrote a little interpreter (in python) to test some ideas.
[much snipped]
That's a lot more interesting to me ;-) No time for it now, though. Maybe later.