# [Tutor] Re: drawing a graph

Kent Johnson kent_johnson at skillsoft.com
Fri Sep 3 11:59:46 CEST 2004

```Fuzzi,

I recently found the graphing module from VPython very easy to use to
create a simple graph. Here is what I did:
http://mail.python.org/pipermail/tutor/2004-August/031568.html
http://vpython.org/

If you want more help with your existing program then posting the code to
the list is a good idea, we can look at it and make suggestions or help you
take the next step.

Kent

At 03:20 AM 9/3/2004 +0100, Fathima Javeed wrote:

>Hi,
>
>I have managed to get distances between sequnces at each P value, using
>randomization, so now i have a html output file where there are two set of
>values one is different distance percentage and another P values from 1 to
>100, How would i draw a graph in Python i.e. distance against P values for
>each sequence, Completely lost now, really would appreciate help, would it
>be helpful to paste my code here?
>
>Cheers
>Fuzzi
>
>>From: Kent Johnson <kent_johnson at skillsoft.com>
>>To: tutor at python.org
>>Subject: Re: [Tutor] need help with comparing list of sequences in
>>Python!!
>>Date: Tue, 31 Aug 2004 07:04:09 -0400
>>
>>Fuzzi,
>>
>>Here is one way to do this:
>>- Use zip() to pair up elements from the two sequences
>> >>> s1='aaabbbbcccc'
>> >>> s2='aaaccccbcccccccccc'
>> >>> zip(s1, s2)
>>[('a', 'a'), ('a', 'a'), ('a', 'a'), ('b', 'c'), ('b', 'c'), ('b', 'c'),
>>('b', 'c'), ('c', 'b'), ('c', 'c'), ('c', 'c'), ('c', 'c')]
>>
>>- Use a list comprehension to compare the elements of the pair and put
>>the results in a new list. I'm not sure if you want to count the matches
>>or the mismatches - your original post says mismatches, but in your
>>example you count matches. This example counts matches but it is easy to
>>change.
>> >>> [a == b for a, b in zip(s1, s2)]
>>[True, True, True, False, False, False, False, False, True, True, True]
>>
>>- In Python, True has a value of 1 and False has a value of 0, so adding
>>up the elements of this list gives the number of matches:
>> >>> sum([a == b for a, b in zip(s1, s2)])
>>6
>>
>>- min() and len() give you the length of the shortest sequence:
>> >>> min(len(s1), len(s2))
>>11
>>
>>- When you divide, you have to convert one of the numbers to a float or
>>Python will use integer division!
>> >>> 6/11
>>0
>> >>> float(6)/11
>>0.54545454545454541
>>
>>Put this together with the framework that Alan gave you to create a
>>program that calculates distances. Then you can start on the
>>randomization part.
>>
>>Kent
>>
>>
>>At 04:03 AM 8/31/2004 +0100, Fathima Javeed wrote:
>>>Hi Kent
>>>
>>>well here is how it works
>>>sequence one = aaabbbbcccc
>>>length = 11
>>>
>>>seq 2 = aaaccccbcccccccccc
>>>length = 18
>>>
>>>to get the pairwise similarity of this score the program compares the
>>>letters
>>>of the two sequences upto length = 11, the length of the shorter sequence.
>>>
>>>so a match gets a score of 1, therefore using + for match and x for mismatch
>>>
>>>aaabbbbcccc
>>>aaaccccbcccccccccc
>>>+++xxxxx+++
>>>
>>>there fore the score = 6/11 = 0.5454 or 54%
>>>
>>>so you only score the first 11 letters of each score and its is not
>>>required to compare the rest of the sequence 2. this is what the
>>>distance matrix is doing
>>>
>>>match score == 6
>>>
>>>The spaces are deleted to make both of them the same length
>>>
>>>
>>>>From: Kent Johnson <kent_johnson at skillsoft.com>
>>>>To: "Fathima Javeed" <fathimajaveed at hotmail.com>, tutor at python.org
>>>>Subject: Re: [Tutor] need help with comparing list of sequences in
>>>>Python!!
>>>>Date: Mon, 30 Aug 2004 13:53:19 -0400
>>>>
>>>>Fuzzi,
>>>>
>>>>How do you count mismatches if the lengths of the sequences are
>>>>different? Do you start from the front of both sequences or do you look
>>>>for a best match? Do you count the extra characters in the longer
>>>>string as mismatches or do you ignore them? An example or two would help.
>>>>
>>>>For example if
>>>>s1=ABCD
>>>>s2=XABDDYY
>>>>how many characters do you count as different?
>>>>
>>>>Kent
>>>>
>>>>At 07:00 PM 8/29/2004 +0100, Fathima Javeed wrote:
>>>>>Hi,
>>>>>would really appreciate it if someone could help me in Python as i am
>>>>>new to the language.
>>>>>
>>>>>Well i have a list of protein sequences in a text file, e.g. (dummy data)
>>>>>
>>>>>MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVCTKEMCTFRDSMAKFNEVNAVVIGISVDP
>>>>>PFS
>>>>>
>>>>>MAPITVGDVVPDGTISFFDENDQLQTVSVHSIAAGKKVILFGVPGAFTPTCSMSHVPGFIGKAEELKSKG
>>>>>
>>>>>APIKVGDAIPAVEVFEGEPGNKVNLAELFKGKKGVLFGVPGAFTPGCSKTHLPGFVEQAEALKAKGVQVVACL
>>>>>SVND
>>>>>
>>>>>HGFRFKLVSDEKGEIGMKYGVVRGEGSNLAAERVTFIIDREGNIRAILRNI
>>>>>
>>>>>etc etc
>>>>>
>>>>>They are not always of the same length,
>>>>>
>>>>>The first sequence is always the reference sequence which i am tring
>>>>>to investigate, basically to reach the objective, i need to compare
>>>>>each sequence with the first one, starting with the the comparison of
>>>>>the reference sequence by itself.
>>>>>
>>>>>The objective of the program, is to manupulate each sequence i.e.
>>>>>randomly change characters and calculate the distance (Distance:
>>>>>Number of letters between a pair of sequnces that dont match  DIVIDED
>>>>>by the length of the shortest sequence) between the sequence in
>>>>>question against the reference sequence. So therefore need  a program
>>>>>code where it takes the first sequence as a reference sequence
>>>>>(constant which is on top of the list), first it compares it with
>>>>>itself, then it compares with the second sequence, then with the third
>>>>>sequence etc etc  each at a time,
>>>>>
>>>>>for the first comparison, you take a copy of the ref sequnce and
>>>>>manupulate the copied sequence) i.e. randomly changing the letters in
>>>>>the sequence, and calculating the distances between them.
>>>>>(the letters that are used for this are: A R N D C E Q G H I L K M F P
>>>>>S T W Y V)
>>>>>
>>>>>The reference sequence is never altered or manupulated, for the first
>>>>>comparison, its the copied version of the reference sequence thats altered.
>>>>>
>>>>>Randomization is done using different P values
>>>>>e.g for example (P = probability of change)
>>>>>if P = 0      no random change has been done
>>>>>if P = 1.0   all the letters in that particular sequence has been
>>>>>randomly changed, therefore p=1.0 equals to the length of the sequence
>>>>>
>>>>>So its calculating the distance each time between two sequences (
>>>>>first is always the reference sequnce and another second sequence) at
>>>>>each P value ( starting from 0, then 0.1, 0.2, ....... 1.0).
>>>>>
>>>>>Note: Number of sequnces to be compared could be any number and of any
>>>>>length
>>>>>
>>>>>I dont know how to compare each sequence with the first sequnce and
>>>>>how to do randomization of the characters in the sequnce therefore to
>>>>>calculate the distance for each pair of sequnce , if someone can give
>>>>>me any guidance, I would be greatful
>>>>>
>>>>>Cheers
>>>>>Fuzzi
>>>>>
>>>>>_________________________________________________________________
>>>>>Stay in touch with absent friends - get MSN Messenger
>>>>>http://www.msn.co.uk/messenger
>>>>>
>>>>>_______________________________________________
>>>>>Tutor maillist  -  Tutor at python.org
>>>>>http://mail.python.org/mailman/listinfo/tutor
>>>
>>>_________________________________________________________________
>>>It's fast, it's easy and it's free. Get MSN Messenger today!
>>>http://www.msn.co.uk/messenger
>>
>>_______________________________________________
>>Tutor maillist  -  Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>
>_________________________________________________________________
>Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
>

```