[Tutor] Finding all locations of a sequence

Thu Jun 14 22:26:16 CEST 2007

Hi Lauren

I use the find string method to search DNA motifs. Here is an example

while sp < len(fasta[j].sequence):
                                pos = string.find(fasta[j].sequence, 
motif[i], sp)
                                if pos != -1 and pos > 0:
                                        plist.append(int(size) - pos)
                                        mlist.append(toprint)
                                        sp = pos
                                else:
                                        sp = len(fasta[j].sequence)-1
                                sp+=1
                        pos = 0
                        sp = 0

You might even be able to trim a bit this code, but it is a start.

HTH

Paulo

Lauren wrote:
> Ok, what I have is a RNA sequence (about 5 million nucleotides
> [characters] long) and have (4100) subsequences (from another
> sequence) and the sub-sequences are 6 characters long, that I want to
> find in it.
> The problem is the exceptional bond of U:G, which results in U bonding
> to A (as per normal) and G (the abnormal bond) and G to bond with C
> (as per normal) and U. Normally I'd go to search software already
> available, however what I need done isn't covered b y anything out
> there so far. That and I believe that they do not deal with the
> exceptional bond. In any event, I want to know where the subsequences
> can bind in the larger RNA sequence (ie, the location of binding in
> the larger sequence) so I'm not (just) for where they would bind
> normally, but also where the abnormal bonds would figure in.
> Unfortunately my first attempt at this was unbearably slow, so I'm
> hoping there is a faster way.
>
> So an example with this would be:
>
> Subseq  AAAAAU can bind to UUUUUA (which is normal) and UUUUUG (not so
> normal) and I want to know where UUUUUA, and UUUUUG are in the large
> RNA sequence, and the locations to show up as one...thing.
>
> I don't know if that is more helpful or not than the chicken example...
>
> Thanks again for the help
>
>
> On 14/06/07, Teresa Stanton <tms43 at clearwire.net> wrote:
>   
>> OK, I'm going to take a shot at this.  If what I'm understanding is correct,
>> a dictionary might help.  But that would depend on the format of the
>> original sequence.  If you have a list:
>>
>> Lst1 = ['cow', 'pig', 'chicken', 'poultry', 'beef', 'pork']
>>
>> Then you could:
>>
>> Lst1.index('chicken')
>>
>> And get 2, because the list starts with 0,  not 1, as the first index.
>>
>> Or this:
>>
>>     
>>>>> for i in Lst1:
>>>>>           
>>         if i == 'chicken':
>>                 print Lst1.index(i)
>>         if i == 'poultry':
>>                 print Lst1.index(i)
>>
>>
>> 2
>> 3
>>
>> Now, Kent or Alan and perhaps others will have a much more sophisticated way
>> of doing this same problem.  I'm still not exactly sure what it is you are
>> looking for, because there isn't enough information for me to really get a
>> grasp on your problem. My response is a simple list structure that has
>> simple operations.
>>
>> Hope it helps :)
>>
>> T
>>
>> -----Original Message-----
>> From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On Behalf
>> Of Lauren
>> Sent: Thursday, June 14, 2007 11:35 AM
>> To: tutor at python.org
>> Subject: [Tutor] Finding all locations of a sequence
>>
>> Ok, please bear with me, I'm very new to programming and python. And
>> my question is rather...convoluted.
>>
>> I have a bunch of sequences (about 4100 or so), and want to know where
>> they are in a very, very large string of letters. But wait, there's
>> more. Some of these sequences match to more than 'word' (for
>> example...to be consistent with later, chicken could match to poultry
>> or chicken).
>>
>> example of what I want to do (on a much smaller scale):
>>
>> Say I have chicken and I want to know where it occurs in a string of
>> words, but I want it to match to both chicken and poultry and have the
>> output of:
>>
>> chicken  (locations of chicken and poultry in the string)
>>
>> or something like that.
>>
>> The string I'm dealing with is really large, so whatever will get
>> through it the fastest is ideal for me.
>>
>> I hope this all makes sense...
>>
>> If it's possible to have pseudocode that would be helpful.
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
>>
>>     
>
>
>