[Tutor] Finding all locations of a sequence

Thu Jun 14 22:13:57 CEST 2007

Ok, what I have is a RNA sequence (about 5 million nucleotides
[characters] long) and have (4100) subsequences (from another
sequence) and the sub-sequences are 6 characters long, that I want to
find in it.
The problem is the exceptional bond of U:G, which results in U bonding
to A (as per normal) and G (the abnormal bond) and G to bond with C
(as per normal) and U. Normally I'd go to search software already
available, however what I need done isn't covered b y anything out
there so far. That and I believe that they do not deal with the
exceptional bond. In any event, I want to know where the subsequences
can bind in the larger RNA sequence (ie, the location of binding in
the larger sequence) so I'm not (just) for where they would bind
normally, but also where the abnormal bonds would figure in.
Unfortunately my first attempt at this was unbearably slow, so I'm
hoping there is a faster way.

So an example with this would be:

Subseq  AAAAAU can bind to UUUUUA (which is normal) and UUUUUG (not so
normal) and I want to know where UUUUUA, and UUUUUG are in the large
RNA sequence, and the locations to show up as one...thing.

I don't know if that is more helpful or not than the chicken example...

Thanks again for the help

On 14/06/07, Teresa Stanton <tms43 at clearwire.net> wrote:
> OK, I'm going to take a shot at this.  If what I'm understanding is correct,
> a dictionary might help.  But that would depend on the format of the
> original sequence.  If you have a list:
>
> Lst1 = ['cow', 'pig', 'chicken', 'poultry', 'beef', 'pork']
>
> Then you could:
>
> Lst1.index('chicken')
>
> And get 2, because the list starts with 0,  not 1, as the first index.
>
> Or this:
>
> >>> for i in Lst1:
>         if i == 'chicken':
>                 print Lst1.index(i)
>         if i == 'poultry':
>                 print Lst1.index(i)
>
>
> 2
> 3
>
> Now, Kent or Alan and perhaps others will have a much more sophisticated way
> of doing this same problem.  I'm still not exactly sure what it is you are
> looking for, because there isn't enough information for me to really get a
> grasp on your problem. My response is a simple list structure that has
> simple operations.
>
> Hope it helps :)
>
> T
>
> -----Original Message-----
> From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On Behalf
> Of Lauren
> Sent: Thursday, June 14, 2007 11:35 AM
> To: tutor at python.org
> Subject: [Tutor] Finding all locations of a sequence
>
> Ok, please bear with me, I'm very new to programming and python. And
> my question is rather...convoluted.
>
> I have a bunch of sequences (about 4100 or so), and want to know where
> they are in a very, very large string of letters. But wait, there's
> more. Some of these sequences match to more than 'word' (for
> example...to be consistent with later, chicken could match to poultry
> or chicken).
>
> example of what I want to do (on a much smaller scale):
>
> Say I have chicken and I want to know where it occurs in a string of
> words, but I want it to match to both chicken and poultry and have the
> output of:
>
> chicken  (locations of chicken and poultry in the string)
>
> or something like that.
>
> The string I'm dealing with is really large, so whatever will get
> through it the fastest is ideal for me.
>
> I hope this all makes sense...
>
> If it's possible to have pseudocode that would be helpful.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
>
>

-- 
Lauren

Laurenb01 at gmail.com