Another "little" combination, permutation, iterator, yield from, example.

On 10/13/2013 02:02 PM, Tim Peters wrote:> [MRAB, posts a beautiful solution]
A what the heck... :-) This is what I came up with which works like MRAB's by removing the first element and using recursion on the rest. (It's how you do it in lisp or scheme.) It's just two generators working together. You can get items with specified length by using a list comprehension or filter. (Disclaimer... I didn't do any more testing (performance or otherwise) than you see below.) Cheers, Ron Adam def combine_with_all(x, seq): """Combine item x to every item in seq.""" for s in seq: if (not (isinstance(s, list))): s = [s] yield s + [x] def unique_sets(seq): """Return all unique combinations of elements in seq.""" if len(seq) == 0: return [] first, rest = seq[0], seq[1:] yield [first] yield from unique_sets(rest) yield from combine_with_all(first, unique_sets(rest)) ### EXAMPLES ### for args in [('x', []), ('x', [1, 2]), ('x', [[1], [2]]), ('x', [[1, 2], 3]), ('x', [[1, 2], [3]]), ('x', ['abc']), ('x', ['ab', 'c'])]: print(list(combine_with_all(*args))) # [] # [[1, 'x'], [2, 'x']] # [[1, 'x'], [2, 'x']] # [[1, 2, 'x'], [3, 'x']] # [[1, 2, 'x'], [3, 'x']] # [['a', 'x'], ['b', 'x'], ['c', 'x']] # [['abc', 'x']] # [['ab', 'x'], ['c', 'x']] print(list(unique_sets('abc'))) # [['a'], ['b'], ['c'], ['c', 'b'], ['b', 'a'], ['c', 'a'], # ['c', 'b', 'a']] print(list(unique_sets(['abc']))) # [['abc']] print(list(unique_sets(['a', 'b', 'c']) )) # [['a'], ['b'], ['c'], ['c', 'b'], ['b', 'a'], ['c', 'a'], # ['c', 'b', 'a']] print(list (unique_sets( [(1,2), (3,4)] ))) # [[(1, 2)], [(3, 4)], [(3, 4), (1, 2)]] print(list (unique_sets( [[1,2], [3,4]] ))) # [[[1, 2]], [[3, 4]], [[3, 4], [1, 2]]] print([x for x in unique_sets(['a', 'b', 'c']) if len(x) == 2]) # [['c', 'a'], ['b', 'a'], ['c', 'b']] print([x for x in unique_sets(["a1", "b", "a2"]) if len(x) == 2]) # [['a2', 'b'], ['b', 'a1'], ['a2', 'a1']]

[Tim]
[Ron Adam]
It's solving a different problem, though. In your "Return all unique combinations of elements in seq", by "unique" you really mean "by position", not " by value". For example:
See? ['a'] is produced twice, and so is ['b', 'a']. The whole point of the algorithms in the thread this spawned from was to avoid generating *value*-duplicates to begin with. Filtering them out later is too inefficient.
Only one comment on the code:
Python's newer unpacking syntax makes that one prettier: first, *rest = seq Much higher-level than crusty old Lisp or Scheme ;-)
....

On 10/15/2013 07:25 PM, Tim Peters wrote:
Correct. And sometimes an items position is part of what makes it unique. Multiple roller combination locks and slot-machines, come to mind. I'm sure there are other things where the label or face value is only one aspect of it's identity.
Well, we can filter them to begin-with instead of later. for p in unique_sets(list(set('aaaaaaaa'))): print(p) ['a'] And skip combination duplicates later as they are produced. for p in unique_sets('aaaaaaaa'): if cached(tuple(p)): continue print(p) ['a'] ['a', 'a'] ['a', 'a', 'a'] ['a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'] Not generating them to begin with requires some sort of known ordering or pattern in the data. The difference between checking them inside the algorithm as they are produced, and checking them externally as they are *yielded* is only the yield overhead if we are talking about python code in both places. In effect the inner loop control is yielded out with the value. That's a very nice thing about "yield from", and it still works that way in the recursive example. That makes things like this a lot more flexible.
LOL... yep. Thats why that part bugged me. I knew there was something I was missing. The other part that bothers me is the insistence list check. And not accepting sets directly.
Much higher-level than crusty old Lisp or Scheme ;-)
HAHA... yep, although they are both new to me and clojure seems to be doing well. A little digression... I've been thinking that pythons byte code could be a little higher level and still be efficient. So I wrote a little interpreter (in python) to test some ideas. * Use nested [sequences] in the bytecode rather than jumps. * Use tuples to contain the function followed by it's arguments.. (very easy to parse and evaluate this way). * Keep bytecodes and keywords to a minimal set. * Use as little punctuation symbols as possible. I didn't start learning lisp and scheme until after I got that much working. But it turned out (and I recently found out) it to be very close to a scheme interpreter or lisp interpreter. This example works, with some syntax alterations from the lisp version. And after adding in the needed standard lisp/scheme functions.. car, cdr, etc... (Looks much nicer with syntax highlighting.) def "power-set set" [ if (is-empty set) [return (list set)] let psets-without-car (power-set (cdr set)) let psets-with-car (mapcar (cons (lambda 'subset' [return (cons (car set) subset)]) psets-without-car)) return (join-lists psets-with-car psets-without-car) ] echo (power-set [[1 2] [3 4] [5 6]]) (power-set [[1 2] [3 4] [5 6]]) == [[[1 2] [3 4] [5 6]] [[1 2] [3 4]] [[1 2] [5 6]] [[1 2]] [[3 4] [5 6]] [[3 4]] [[5 6]] []] When I first looked at the lisp version of this, I had no idea it had three different return points. Explicit is better than implicit. ;-) So how does that anything like byte code? The structure is just nested tokens and symbols. (This will parse and evaluate fine in this form ... with the comments too.) def # keyword "power-set set" # string-literal [ # block-start if # keyword ( # expression-start is-empty # name set # name ) # expression-stop [ # block-start return # keyword ( # expression-start list # name set # name ) # expression-stop ] # block-stop .... etc. There's a bit more... python wrappers for dictionaries, and sets, and lower level functions. I was thinking of rewriting it in C and seeing how it performs, but Gambit-C may be a better choice. It has nearly all the low level features python needs and compiles to C or Java. So don't hold your breath. It will be a while before I can get that far. :-) Cheers, Ron

... [Tim]
[Ron Adam]
I'm wondering whether you read any of the thread this one spawned from? Yes, it was long and tedious ;-) But Python's itertools already has `permutations()` and `combinations()` functions that work by position. Those problems are already solved in the standard library. The entire point of that thread was to investigate "by value" functions instead.
Yes, and that approach was already discussed at great length. It's semantically incorrect; e.g.,
('a', 'a') is _expected_ output.
And that approach too was discussed at great length. It's too slow and too wasteful of space in general, For a time example,
How long do you think unique_sets would take to do that? No, I can't count that high either ;-) For a space example, just pick one with _many_ distinct outputs. The cache must grow to hold all of them. All discussed before.
It requires comparing the iterable's elements for equality. That's all. Where N = len(iterable), the algorithm I posted requires worst-case O(N) space and worst-case N-choose-2 item comparisons, regardless of how many "anagrams" are generated.
The algorithm I posted was Python code, and can be astronomically more efficient than "checking them externally" (see the 'A' 1000 + 'B' * 1000 example above).
That's a lot more interesting to me ;-) No time for it now, though. Maybe later.

On Oct 15, 2013, at 23:21, Ron Adam <ron3200@gmail.com> wrote:
A little digression...
I've been thinking that pythons byte code could be a little higher level and still be efficient. So I wrote a little interpreter (in python) to test some ideas.
Please post this on a separate thread. It's a potentially very cool idea, and it's in danger of being lost buried under yet another combinatorially-slow implementation of multiset/sequence permutations.

On 10/16/2013 08:13 PM, Andrew Barnert wrote:
Potentially is the key word here. It's in very very early stages. I will put the files in the tracker once I get it cleaned up a bit more. (in a few days.) And will post a link to it in a new thread and a summary. The part I've been working on is just to design a simple mid/lower level language and test the basic design. It's about 10 times slower than python right now, becuase python is used to simulate the evaluation loop and call stack. It's pretty much just a toy at this stage, but an interesting one. Ultimately I'd like to have it run on Gambit-C, which is very fast, but hides all the low-level memory and resouces mangagement things undernieth it and will compile the programs to C code. So I think it will give many new options. The down side is it will require learning some scheme programming. The test language I'm toying with can make that part much easier I hope. Cheers, Ron

On 10/17/2013 1:58 PM, Ron Adam wrote:
If you are not familiar with http://code.google.com/p/wpython2/ "A wordcode-based Python implementation" you might find it interesting. -- Terry Jan Reedy

On 10/17/2013 04:59 PM, Terry Reedy wrote:
Thanks, that looks interesting, I hadn't seen it before. I glanced over the following... http://wpython2.googlecode.com/files/Cleanup%20and%20new%20optimizations%20i... I'm looking for a path that will also lead to creating stand alone executables easily. This is a taste of that ... https://groups.google.com/forum/#!topic/clojure/CsGhwc3oyUQ I'm not sure how difficult it will be, but it looks interesting. Note that they talk about outputting scheme code... clojure is very near scheme, so that's not surprising. Python outputs byte_code which is a bit harder to convert. We might be able to improve that part of it. Cheers, Ron

[Tim]
[Ron Adam]
It's solving a different problem, though. In your "Return all unique combinations of elements in seq", by "unique" you really mean "by position", not " by value". For example:
See? ['a'] is produced twice, and so is ['b', 'a']. The whole point of the algorithms in the thread this spawned from was to avoid generating *value*-duplicates to begin with. Filtering them out later is too inefficient.
Only one comment on the code:
Python's newer unpacking syntax makes that one prettier: first, *rest = seq Much higher-level than crusty old Lisp or Scheme ;-)
....

On 10/15/2013 07:25 PM, Tim Peters wrote:
Correct. And sometimes an items position is part of what makes it unique. Multiple roller combination locks and slot-machines, come to mind. I'm sure there are other things where the label or face value is only one aspect of it's identity.
Well, we can filter them to begin-with instead of later. for p in unique_sets(list(set('aaaaaaaa'))): print(p) ['a'] And skip combination duplicates later as they are produced. for p in unique_sets('aaaaaaaa'): if cached(tuple(p)): continue print(p) ['a'] ['a', 'a'] ['a', 'a', 'a'] ['a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a', 'a'] ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'] Not generating them to begin with requires some sort of known ordering or pattern in the data. The difference between checking them inside the algorithm as they are produced, and checking them externally as they are *yielded* is only the yield overhead if we are talking about python code in both places. In effect the inner loop control is yielded out with the value. That's a very nice thing about "yield from", and it still works that way in the recursive example. That makes things like this a lot more flexible.
LOL... yep. Thats why that part bugged me. I knew there was something I was missing. The other part that bothers me is the insistence list check. And not accepting sets directly.
Much higher-level than crusty old Lisp or Scheme ;-)
HAHA... yep, although they are both new to me and clojure seems to be doing well. A little digression... I've been thinking that pythons byte code could be a little higher level and still be efficient. So I wrote a little interpreter (in python) to test some ideas. * Use nested [sequences] in the bytecode rather than jumps. * Use tuples to contain the function followed by it's arguments.. (very easy to parse and evaluate this way). * Keep bytecodes and keywords to a minimal set. * Use as little punctuation symbols as possible. I didn't start learning lisp and scheme until after I got that much working. But it turned out (and I recently found out) it to be very close to a scheme interpreter or lisp interpreter. This example works, with some syntax alterations from the lisp version. And after adding in the needed standard lisp/scheme functions.. car, cdr, etc... (Looks much nicer with syntax highlighting.) def "power-set set" [ if (is-empty set) [return (list set)] let psets-without-car (power-set (cdr set)) let psets-with-car (mapcar (cons (lambda 'subset' [return (cons (car set) subset)]) psets-without-car)) return (join-lists psets-with-car psets-without-car) ] echo (power-set [[1 2] [3 4] [5 6]]) (power-set [[1 2] [3 4] [5 6]]) == [[[1 2] [3 4] [5 6]] [[1 2] [3 4]] [[1 2] [5 6]] [[1 2]] [[3 4] [5 6]] [[3 4]] [[5 6]] []] When I first looked at the lisp version of this, I had no idea it had three different return points. Explicit is better than implicit. ;-) So how does that anything like byte code? The structure is just nested tokens and symbols. (This will parse and evaluate fine in this form ... with the comments too.) def # keyword "power-set set" # string-literal [ # block-start if # keyword ( # expression-start is-empty # name set # name ) # expression-stop [ # block-start return # keyword ( # expression-start list # name set # name ) # expression-stop ] # block-stop .... etc. There's a bit more... python wrappers for dictionaries, and sets, and lower level functions. I was thinking of rewriting it in C and seeing how it performs, but Gambit-C may be a better choice. It has nearly all the low level features python needs and compiles to C or Java. So don't hold your breath. It will be a while before I can get that far. :-) Cheers, Ron

... [Tim]
[Ron Adam]
I'm wondering whether you read any of the thread this one spawned from? Yes, it was long and tedious ;-) But Python's itertools already has `permutations()` and `combinations()` functions that work by position. Those problems are already solved in the standard library. The entire point of that thread was to investigate "by value" functions instead.
Yes, and that approach was already discussed at great length. It's semantically incorrect; e.g.,
('a', 'a') is _expected_ output.
And that approach too was discussed at great length. It's too slow and too wasteful of space in general, For a time example,
How long do you think unique_sets would take to do that? No, I can't count that high either ;-) For a space example, just pick one with _many_ distinct outputs. The cache must grow to hold all of them. All discussed before.
It requires comparing the iterable's elements for equality. That's all. Where N = len(iterable), the algorithm I posted requires worst-case O(N) space and worst-case N-choose-2 item comparisons, regardless of how many "anagrams" are generated.
The algorithm I posted was Python code, and can be astronomically more efficient than "checking them externally" (see the 'A' 1000 + 'B' * 1000 example above).
That's a lot more interesting to me ;-) No time for it now, though. Maybe later.

On Oct 15, 2013, at 23:21, Ron Adam <ron3200@gmail.com> wrote:
A little digression...
I've been thinking that pythons byte code could be a little higher level and still be efficient. So I wrote a little interpreter (in python) to test some ideas.
Please post this on a separate thread. It's a potentially very cool idea, and it's in danger of being lost buried under yet another combinatorially-slow implementation of multiset/sequence permutations.

On 10/16/2013 08:13 PM, Andrew Barnert wrote:
Potentially is the key word here. It's in very very early stages. I will put the files in the tracker once I get it cleaned up a bit more. (in a few days.) And will post a link to it in a new thread and a summary. The part I've been working on is just to design a simple mid/lower level language and test the basic design. It's about 10 times slower than python right now, becuase python is used to simulate the evaluation loop and call stack. It's pretty much just a toy at this stage, but an interesting one. Ultimately I'd like to have it run on Gambit-C, which is very fast, but hides all the low-level memory and resouces mangagement things undernieth it and will compile the programs to C code. So I think it will give many new options. The down side is it will require learning some scheme programming. The test language I'm toying with can make that part much easier I hope. Cheers, Ron

On 10/17/2013 1:58 PM, Ron Adam wrote:
If you are not familiar with http://code.google.com/p/wpython2/ "A wordcode-based Python implementation" you might find it interesting. -- Terry Jan Reedy

On 10/17/2013 04:59 PM, Terry Reedy wrote:
Thanks, that looks interesting, I hadn't seen it before. I glanced over the following... http://wpython2.googlecode.com/files/Cleanup%20and%20new%20optimizations%20i... I'm looking for a path that will also lead to creating stand alone executables easily. This is a taste of that ... https://groups.google.com/forum/#!topic/clojure/CsGhwc3oyUQ I'm not sure how difficult it will be, but it looks interesting. Note that they talk about outputting scheme code... clojure is very near scheme, so that's not surprising. Python outputs byte_code which is a bit harder to convert. We might be able to improve that part of it. Cheers, Ron
participants (4)
-
Andrew Barnert
-
Ron Adam
-
Terry Reedy
-
Tim Peters