[Tutor] How can I find a group of characters in a list of strings?

Mats Wichmann mats at wichmann.us
Wed Jul 25 20:23:11 EDT 2018


On 07/25/2018 05:50 PM, Jim wrote:
> Linux mint 18 and python 3.6
> 
> I have a list of strings that contains slightly more than a million
> items. Each item is a string of 8 capital letters like so:
> 
> ['MIBMMCCO', 'YOWHHOY', ...]
> 
> I need to check and see if the letters 'OFHCMLIP' are one of the items
> in the list but there is no way to tell in what order the letters will
> appear. So I can't just search for the string 'OFHCMLIP'. I just need to
> locate any strings that are made up of those letters no matter their order.
> 
> I suppose I could loop over the list and loop over each item using a
> bunch of if statements exiting the inner loop as soon as I find a letter
> is not in the string, but there must be a better way.
> 
> I'd appreciate hearing about a better way to attack this.

It's possible that the size of the biglist and the length of the key has
enough performance impacts that a quicky (untested because I don't have
your data) solution is unworkable for performance reasons.  But a quicky
might be to take these two steps:

1. generate a list of the permutations of the target
2. check if any member of the target-permutation-list is in the biglist.

Python sets are a nice way to check membership.

from itertools import permutations
permlist = [''.join(p) for p in permutations('MIBMMCCO', 8)]

if not set(permlist).isdisjoint(biglist):
    print("Found a permutation of MIBMMCCO")



More information about the Tutor mailing list