[Tutor] number of mismatches in a string

Jerry Hill malaclypse2 at gmail.com
Fri Mar 2 23:00:50 CET 2012


On Fri, Mar 2, 2012 at 2:11 PM, Hs Hs <ilhs_hs at yahoo.com> wrote:
> Hi:
> I have the following table and I am interested in calculating mismatch
> ratio. I am not completely clear how to do this and any help is deeply
> appreciated.
>
> Length     Matches
> 77      24A0T9T36
> 71      25^T9^T37
> 60      25^T9^T26
> 62      42A19
>
>
> In length column I have length of the character string.
> In the second column I have the matches my reference string.
>
>
> In fist case, where 77 is length, in matches from left to right, first 24
> matched my reference string following by a extra character A, a null (does
> not account to proble) and extra T, 9 matches, extra T and 36 matches.
>  Totally there are 3 mismatches
>
> In case 2, I lost 2 characters (^ = loss of character compared to reference
> sentence)   -
>
> TOMISAGOODBOY
> T^MISAGOOD^OY   (here I lost 2 characters)  = I have 2 mismatches
> TOMISAGOOODBOOY (here I have 2 extra characters O and O) = I have two
> mismatches
>
>
> In case 4: I have 42 matches, extra A and 19 matches = so I have 1 mismatch
>
>
> How can that mismatch number from matches string.
> 1. I have to count how many A or T or G or C (believe me only these 4
> letters will appear in this, i will not see Z or B or K etc)
> 2. ^T or ^A or ^G or ^C will also be a mismatch
>
>
> desired output:
>
> Length     Matches   mismatches
> 77      24A0T9T36    3
> 71      25^T9^T37     2
> 60      25^T9^T26     2
> 62      42A19             1
> 10      6^TTT1           3
>

It looks like all you need to do is count the number of A, T, C, and G
characters in your Matches column.  Maybe something like this:

differences = [
    [77, '24A0T9T36'],
    [71, '25^T9^T37'],
    [60, '25^T9^T26'],
    [62, '42A19']
]


for length, matches in differences:
    mismatches = 0
    for char in matches:
        if char in ('A', 'T', 'G', 'C'):
            mismatches += 1
    print length, matches, mismatches


which produces the following output:
77 24A0T9T36 3
71 25^T9^T37 2
60 25^T9^T26 2
62 42A19 1

-- 
Jerry


More information about the Tutor mailing list