create lowercase strings in lists - was: (No subject)
Mike Meyer
mwm at mired.org
Thu Dec 16 19:44:31 EST 2004
Steve Holden <steve at holdenweb.com> writes:
> Mark Devine wrote:
>
>> Actually what I want is element 'class-map match-all cmap1' from list 1 to match 'class-map cmap1 (match-all)' or 'class-map cmap1 mark match-all done' in list 2 but not to match 'class-map cmap1'.
>> Each element in both lists have multiple words in them. If all the words of any element of the first list appear in any order within any element of the second list I want a match but if any of the words are missing then there is no match. There are far more elements in list 2 than in list 1.
>>
> Well since that's the case it would seem you'd be best processing each
> item from the large list against the small list, though in truth it
> may not make any difference.
>
> It looks like the best way to proceed might be to reduce each line to
> a canonical form -- strip the parens and other irrelevant characters
> out, and sort the words in order. After that it'd be relatively simple
> to determine whether two lines match - they'd be the same!
No, that doesn't work. What happens if an element of the second list
has *more* words than the element in the first list? In that case, the
two canonical forms would be different, but it should still be a
match.
How about this (If I had sample data, I'd try it out directly...):
Create a dictionary of sets. For each word in an element in the small
list, insert into the set indexed by that word in the dictionary a
tuple version of the list (you'll want to create the tuples in
advance, and associate them with each list somehow).
Then go through the long list, and for each element collect all the
sets that are indexed by the words in that element, and take the
intersection of them all. If there are any tuples in the intersection,
then you have a match.
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list