[Python-ideas] Another "little" combination, permutation, iterator, yield from, example.
Tim Peters
tim.peters at gmail.com
Wed Oct 16 22:56:38 CEST 2013
...
[Tim]
>> It's solving a different problem, though. In your "Return all unique
>> combinations of elements in seq", by "unique" you really mean "by
>> position", not " by value". ...
[Ron Adam]
> Correct.
>
> And sometimes an items position is part of what makes it unique. Multiple
> roller combination locks and slot-machines, come to mind. I'm sure there
> are other things where the label or face value is only one aspect of it's
> identity.
I'm wondering whether you read any of the thread this one spawned
from? Yes, it was long and tedious ;-) But Python's itertools
already has `permutations()` and `combinations()` functions that work
by position. Those problems are already solved in the standard
library. The entire point of that thread was to investigate "by
value" functions instead.
>> >>> for p in unique_sets('aab'):
>> ... print(p)
>> ['a']
>> ['a']
>> ['b']
>> ['b', 'a']
>> ['a', 'a']
>> ['b', 'a']
>> ['b', 'a', 'a']
>>
>> See? ['a'] is produced twice, and so is ['b', 'a']. The whole point
>> of the algorithms in the thread this spawned from was to avoid
>> generating *value*-duplicates to begin with. Filtering them out later
>> is too inefficient.
> Well, we can filter them to begin-with instead of later.
>
> for p in unique_sets(list(set('aaaaaaaa'))):
> print(p)
>
> ['a']
Yes, and that approach was already discussed at great length. It's
semantically incorrect; e.g.,
>>> for a in anagrams('aab', 2):
... print(a)
('a', 'a')
('a', 'b')
('b', 'a')
('a', 'a') is _expected_ output.
> And skip combination duplicates later as they are produced.
>
> for p in unique_sets('aaaaaaaa'):
> if cached(tuple(p)):
> continue
> print(p)
>
> ['a']
> ['a', 'a']
> ...
And that approach too was discussed at great length. It's too slow
and too wasteful of space in general, For a time example,
>>> for a in anagrams('A' * 1000 + 'B' * 1000, 2):
... print(a)
('A', 'A')
('A', 'B')
('B', 'A')
('B', 'B')
How long do you think unique_sets would take to do that? No, I can't
count that high either ;-)
For a space example, just pick one with _many_ distinct outputs. The
cache must grow to hold all of them. All discussed before.
> ...
> Not generating them to begin with requires some sort of known ordering or
> pattern in the data.
It requires comparing the iterable's elements for equality. That's
all. Where N = len(iterable), the algorithm I posted requires
worst-case O(N) space and worst-case N-choose-2 item comparisons,
regardless of how many "anagrams" are generated.
> The difference between checking them inside the algorithm as they are
> produced, and checking them externally as they are *yielded* is only the
> yield overhead if we are talking about python code in both places.
The algorithm I posted was Python code, and can be astronomically more
efficient than "checking them externally" (see the 'A' 1000 + 'B' *
1000 example above).
> ...
> I've been thinking that pythons byte code could be a little higher level and
> still be efficient. So I wrote a little interpreter (in python) to test
> some ideas.
>
> [much snipped]
That's a lot more interesting to me ;-) No time for it now, though.
Maybe later.
More information about the Python-ideas
mailing list