[Tutor] How can I find a group of characters in a list of strings?
Jim
jf_byrnes at comcast.net
Wed Jul 25 21:40:15 EDT 2018
On 07/25/2018 07:29 PM, Martin A. Brown wrote:
>
>> I have a list of strings that contains slightly more than a
>> million items. Each item is a string of 8 capital letters like so:
>>
>> ['MIBMMCCO', 'YOWHHOY', ...]
>>
>> I need to check and see if the letters 'OFHCMLIP' are one of the items in the
>> list but there is no way to tell in what order the letters will appear. So I
>> can't just search for the string 'OFHCMLIP'. I just need to locate any strings
>> that are made up of those letters no matter their order.
>>
>> I suppose I could loop over the list and loop over each item using a bunch of
>> if statements exiting the inner loop as soon as I find a letter is not in the
>> string, but there must be a better way.
>>
>> I'd appreciate hearing about a better way to attack this.
>>
>> thanks, Jim
>
> If I only had to do this once, over only a million items (given
> today's CPU power), so I'd probably do something like the below
> using sets. I couldn't tell from your text whether you wanted to
> see all of the entries in 'OFHCMLIP' in each entry or if you wanted
> to see only that some subset were present. So, here's a script that
> will produce a partial match and exact match.
>
> Note, I made a 9-character string, too because you had a 7-character
> string as your second sample -- mostly to point out that the
> 9-character string satisfies an exact match although it sports an
> extra character.
Sorry, that was a typo, they are all 8 characters in length.
> farm = ['MIBMMCCO', 'YOWHHOY', 'OFHCMLIP', 'OFHCMLIPZ', 'FHCMLIP', 'NEGBQJKR']
> needle = set('OFHCMLIP')
> for haystack in farm:
> partial = needle.intersection(haystack)
> exact = needle.intersection(haystack) == needle
> print(haystack, exact, ''.join(sorted(partial)))
>
> On the other hand, there are probably lots of papers on how to do
> this much more efficiently.
>
> -Martin
Thanks for your help. Steven came up with a solution that works well for me.
Regards, Jim
More information about the Tutor
mailing list