[Tutor] Finding all locations of a sequence
Terry Carroll
carroll at tjc.com
Fri Jun 15 00:06:26 CEST 2007
On Thu, 14 Jun 2007, Lauren wrote:
> Subseq AAAAAU can bind to UUUUUA (which is normal) and UUUUUG (not so
> normal) and I want to know where UUUUUA, and UUUUUG are in the large
> RNA sequence, and the locations to show up as one...thing.
How about something like this?
========================================================================
def seqsearch(seq, targets):
"""
return a list of match objects, each of which identifies where any of
the targets are found in the string seq
seq: string to be searched
targets: list or tuple of alternate targets to be searched
note: re.findall is not used, because it wont catch overlaps
"""
import re
resultlist=[]
pos=0
regext_text = "|".join(targets)
regex = re.compile(regext_text)
while True:
result = regex.search(seq, pos)
if result is None:
break
resultlist.append(result)
pos = result.start()+1
return resultlist
targets = ["UUUUUA", "UUUUUG"]
sequence="UUCAAUUUGATACCAUUUUUAGCUUCCGUUUUUGCGATACCAUUUUAGCGU"
# ++++++ ++++++
# 0 1 2 3 4 5
# 012345678901234567890123456789012345678901234567890
# note: matches at 15 & 28
matches = seqsearch(sequence, targets)
for m in matches:
print "match %s found at location %s" % (sequence[m.start():m.end()],
m.start())
========================================================================
This prints, as expected:
match UUUUUA found at location 15
match UUUUUG found at location 28
More information about the Tutor
mailing list