Recursive generator for combinations of a multiset?
python at mrabarnett.plus.com
Sat Nov 23 05:23:42 CET 2013
On 23/11/2013 00:58, John O'Hagan wrote:
> On Thu, 21 Nov 2013 12:59:26 -0800
> Dan Stromberg <drsalists at gmail.com> wrote:
>> On Wed, Nov 20, 2013 at 10:46 PM, John O'Hagan
>> <research at johnohagan.com>wrote:
>> > Short story: the subject says it all, so if you have an answer
>> > already, fire away. Below is the long story of what I'm using it
>> > for, and why I think it needs to be recursive. It may even be of
>> > more general interest in terms of filtering the results of
>> > generators.
>> I think you probably need permutations rather than combinations.
>> Also, I think you'll need to form a word (partitioned off by spaces),
>> and then check it against a set containing /usr/share/dict/words
>> before recursing for the remainder of the sentence - this should
>> speed things up a LOT.
> Thanks for the reply. If I understand you correctly, you are suggesting
> permuting the input _characters_ to form words and then seeing if
> they exist, as opposed to my approach of combining known words and
> seeing if they are anagrams. (Permutations of words would not help find
> anagrams as they merely change the word order). Here is an attempt at
> def anagrams(partition, input_string):
> """Find anagrams which fit given partition of input string length"""
> if not partition:
> yield (), input_string
> for words, checkstring in anagrams(partition[:-1], input_string):
> for word in itertools.permutations(checkstring, partition[-1]):
> word = ''.join(word)
> if word in WORDS: #WORDS is collection of dictionary words
> newstring = checkstring
> for l in word:
> newstring = newstring.replace(l, '' , 1)
> yield words + (word,), newstring
> There are two problems with this. If there are repeated characters in
> the input, redundant results are produced; a multiset-permutation
> algorithm would fix this. But the main problem is it is incredibly
> slow: on my run-of-the-mill laptop, it chokes on anything longer than
> about 10 characters, spending most of its time rejecting non-words.
> Or have I misunderstood your suggestion?
If you want to know how to get unique permutations, have a look here:
More information about the Python-list