On 07/25/2018 07:29 PM, Martin A. Brown wrote:
>> I have a list of strings that contains slightly more than a
>> million items. Each item is a string of 8 capital letters like so:
>> ['MIBMMCCO', 'YOWHHOY', ...]
>> I need to check and see if the letters 'OFHCMLIP' are one of the items in the
>> list but there is no way to tell in what order the letters will appear. So I
>> can't just search for the string 'OFHCMLIP'. I just need to locate any strings
>> that are made up of those letters no matter their order.
>> I suppose I could loop over the list and loop over each item using a bunch of
>> if statements exiting the inner loop as soon as I find a letter is not in the
>> string, but there must be a better way.
>> I'd appreciate hearing about a better way to attack this.
>> thanks,  Jim
> If I only had to do this once, over only a million items (given
> today's CPU power), so I'd probably do something like the below
> using sets.  I couldn't tell from your text whether you wanted to
> see all of the entries in 'OFHCMLIP' in each entry or if you wanted
> to see only that some subset were present.  So, here's a script that
> will produce a partial match and exact match.
> Note, I made a 9-character string, too because you had a 7-character
> string as your second sample -- mostly to point out that the
> 9-character string satisfies an exact match although it sports an
> extra character.

Sorry, that was a typo, they are all 8 characters in length.

>    needle = set('OFHCMLIP')
>    for haystack in farm:
>        partial = needle.intersection(haystack)
>        exact = needle.intersection(haystack) == needle
>        print(haystack, exact, ''.join(sorted(partial)))
> On the other hand, there are probably lots of papers on how to do
> this much more efficiently.
> -Martin

Thanks for your help. Steven came up with a solution that works well for me.

Regards,  Jim

