[Tutor] Finding all locations of a sequence
nuin at genedrift.org
Thu Jun 14 22:26:16 CEST 2007
I use the find string method to search DNA motifs. Here is an example
while sp < len(fasta[j].sequence):
pos = string.find(fasta[j].sequence,
if pos != -1 and pos > 0:
plist.append(int(size) - pos)
sp = pos
sp = len(fasta[j].sequence)-1
pos = 0
sp = 0
You might even be able to trim a bit this code, but it is a start.
> Ok, what I have is a RNA sequence (about 5 million nucleotides
> [characters] long) and have (4100) subsequences (from another
> sequence) and the sub-sequences are 6 characters long, that I want to
> find in it.
> The problem is the exceptional bond of U:G, which results in U bonding
> to A (as per normal) and G (the abnormal bond) and G to bond with C
> (as per normal) and U. Normally I'd go to search software already
> available, however what I need done isn't covered b y anything out
> there so far. That and I believe that they do not deal with the
> exceptional bond. In any event, I want to know where the subsequences
> can bind in the larger RNA sequence (ie, the location of binding in
> the larger sequence) so I'm not (just) for where they would bind
> normally, but also where the abnormal bonds would figure in.
> Unfortunately my first attempt at this was unbearably slow, so I'm
> hoping there is a faster way.
> So an example with this would be:
> Subseq AAAAAU can bind to UUUUUA (which is normal) and UUUUUG (not so
> normal) and I want to know where UUUUUA, and UUUUUG are in the large
> RNA sequence, and the locations to show up as one...thing.
> I don't know if that is more helpful or not than the chicken example...
> Thanks again for the help
> On 14/06/07, Teresa Stanton <tms43 at clearwire.net> wrote:
>> OK, I'm going to take a shot at this. If what I'm understanding is correct,
>> a dictionary might help. But that would depend on the format of the
>> original sequence. If you have a list:
>> Lst1 = ['cow', 'pig', 'chicken', 'poultry', 'beef', 'pork']
>> Then you could:
>> And get 2, because the list starts with 0, not 1, as the first index.
>> Or this:
>>>>> for i in Lst1:
>> if i == 'chicken':
>> print Lst1.index(i)
>> if i == 'poultry':
>> print Lst1.index(i)
>> Now, Kent or Alan and perhaps others will have a much more sophisticated way
>> of doing this same problem. I'm still not exactly sure what it is you are
>> looking for, because there isn't enough information for me to really get a
>> grasp on your problem. My response is a simple list structure that has
>> simple operations.
>> Hope it helps :)
>> -----Original Message-----
>> From: tutor-bounces at python.org [mailto:tutor-bounces at python.org] On Behalf
>> Of Lauren
>> Sent: Thursday, June 14, 2007 11:35 AM
>> To: tutor at python.org
>> Subject: [Tutor] Finding all locations of a sequence
>> Ok, please bear with me, I'm very new to programming and python. And
>> my question is rather...convoluted.
>> I have a bunch of sequences (about 4100 or so), and want to know where
>> they are in a very, very large string of letters. But wait, there's
>> more. Some of these sequences match to more than 'word' (for
>> example...to be consistent with later, chicken could match to poultry
>> or chicken).
>> example of what I want to do (on a much smaller scale):
>> Say I have chicken and I want to know where it occurs in a string of
>> words, but I want it to match to both chicken and poultry and have the
>> output of:
>> chicken (locations of chicken and poultry in the string)
>> or something like that.
>> The string I'm dealing with is really large, so whatever will get
>> through it the fastest is ideal for me.
>> I hope this all makes sense...
>> If it's possible to have pseudocode that would be helpful.
>> Tutor maillist - Tutor at python.org
More information about the Tutor