# Searching for uniqness in a list of data

Paul McGuire ptmcg at austin.rr._bogus_.com
Wed Mar 1 19:26:42 CET 2006

```"rh0dium" <steven.klass at gmail.com> wrote in message
> Hi all,
>
> I am having a bit of difficulty in figuring out an efficient way to
> split up my data and identify the unique pieces of it.
>
> list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
>
> Now I want to split each item up on the "_" and compare it with all
> others on the list, if there is a difference I want to create a list of
> the possible choices, and ask the user which choice of the list they
> want.
<snip>

Check out difflib.

>>> data=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log']
>>> data[0].split("_")
['1p2m', '3.3-1.8v', 'sal', 'ms']
>>> data[1].split("_")
['1p2m', '3.3-1.8', 'sal', 'log']
>>> from difflib import SequenceMatcher
>>> s = SequenceMatcher(None, data[0].split("_"), data[1].split("_"))
>>> s.matching_blocks
[(0, 0, 1), (2, 2, 1), (4, 4, 0)]

I believe one interprets the tuples in matching_blocks as:
(seq1index,seq2index,numberOfMatchingItems)

In your case, the sequences have a matching element 0 and matching element
2, each of length 1.  I don't fully grok the meaning of the (4,4,0) tuple,
unless this is intended to show that both sequences have the same length.

Perhaps from here, you could locate the gaps in the
SequenceMatcher.matching_blocks property, and prompt for the user's choice.

-- Paul

```