[Python-ideas] Another "little" combination, permutation, iterator, yield from, example.

Wed Oct 16 22:56:38 CEST 2013

...

[Tim]
>> It's solving a different problem, though.  In your "Return all unique
>> combinations of elements in seq", by "unique" you really mean "by
>> position", not " by value".  ...

[Ron Adam]
> Correct.
>
> And sometimes an items position is part of what makes it unique.  Multiple
> roller combination locks and slot-machines, come to mind.  I'm sure there
> are other things where the label or face value is only one aspect of it's
> identity.

I'm wondering whether you read any of the thread this one spawned
from?  Yes, it was long and tedious ;-)  But Python's itertools
already has `permutations()` and `combinations()` functions that work
by position.  Those problems are already solved in the standard
library.  The entire point of that thread was to investigate "by
value" functions instead.

>> >>>   for p in unique_sets('aab'):
>> ...        print(p)
>> ['a']
>> ['a']
>> ['b']
>> ['b', 'a']
>> ['a', 'a']
>> ['b', 'a']
>> ['b', 'a', 'a']
>>
>> See?  ['a'] is produced twice, and so is ['b', 'a'].  The whole point
>> of the algorithms in the thread this spawned from was to avoid
>> generating *value*-duplicates to begin with.  Filtering them out later
>> is too inefficient.

> Well, we can filter them to begin-with instead of later.
>
> for p in unique_sets(list(set('aaaaaaaa'))):
>     print(p)
>
> ['a']

Yes, and that approach was already discussed at great length.  It's
semantically incorrect; e.g.,

>>> for a in anagrams('aab', 2):
...    print(a)
('a', 'a')
('a', 'b')
('b', 'a')

('a', 'a') is _expected_ output.

> And skip combination duplicates later as they are produced.
>
> for p in unique_sets('aaaaaaaa'):
>     if cached(tuple(p)):
>        continue
>     print(p)
>
> ['a']
> ['a', 'a']
> ...

And that approach too was discussed at great length.  It's too slow
and too wasteful of space in general,  For a time example,

>>> for a in anagrams('A' * 1000 + 'B' * 1000, 2):
...     print(a)
('A', 'A')
('A', 'B')
('B', 'A')
('B', 'B')

How long do you think unique_sets would take to do that?  No, I can't
count that high either ;-)

For a space example, just pick one with _many_ distinct outputs.  The
cache must grow to hold all of them.  All discussed before.

> ...
> Not generating them to begin with requires some sort of known ordering or
> pattern in the data.

It requires comparing the iterable's elements for equality.  That's
all.  Where N = len(iterable), the algorithm I posted requires
worst-case O(N) space and worst-case N-choose-2 item comparisons,
regardless of how many "anagrams" are generated.

> The difference between checking them inside the algorithm as they are
> produced, and checking them externally as they are *yielded* is only the
> yield overhead if we are talking about python code in both places.

The algorithm I posted was Python code, and can be astronomically more
efficient than "checking them externally" (see the 'A' 1000 + 'B' *
1000 example above).

> ...
> I've been thinking that pythons byte code could be a little higher level and
> still be efficient.  So I wrote a little interpreter (in python) to test
> some ideas.
>
> [much snipped]

That's a lot more interesting to me ;-)  No time for it now, though.
Maybe later.