[Tutor] Splitting text
Terry Carroll
carroll at tjc.com
Thu Jun 29 23:30:25 CEST 2006
On Thu, 29 Jun 2006, Apparao Anakapalli wrote:
> pattern = 'ATTTA'
>
> I want to find the pattern in the sequence and count.
>
> For instance in 'a' there are two 'ATTTA's.
use re.findall:
>>> import re
>>> pat = "ATTTA"
>>> rexp=re.compile(pat)
>>> a = "TCCCTGCGGCGCATGAGTGACTGGCGTATTTAGCCCGTCACATTTA"
>>> print len(re.findall(rexp,a))
2
>>> b = "CCTGCGGCGCATGAGTGACTGGCGTATTTAGCCCGTCACAATTTAA"
>>> print len(re.findall(rexp,b))
2
Be aware, though, that findall finds non-overlapping occurances; and if
overlapping occurances are important to you, it will fail:
>>> c = "ATTTATTTA"
>>> print len(re.findall(rexp,c))
1
The following method will count all occurances, even if they overlap:
def findall_overlap(regex, seq):
resultlist=[]
pos=0
while True:
result = regex.search(seq, pos)
if result is None:
break
resultlist.append(seq[result.start():result.end()])
pos = result.start()+1
return resultlist
For example:
>>> print len(findall_overlap(rexp,c))
2
More information about the Tutor
mailing list