[Tutor] extracting numbers from a list
kumar s
ps_python at yahoo.com
Tue Oct 17 23:42:28 CEST 2006
In continuation to :
Re: [Tutor] extracting numbers from a list
hello list
I have coordinates for exons (chunks of sequence). For
instance:
10 - 50 A
10 - 20 B
35 - 50 B
60 - 70 A
60 - 70 B
80 - 100 A
80 - 100 B
(The above coordinates and names are easier than in
dat)
Here my aim is to creat chunks of exons specific to A
or B.
For instance:
10 - 20,35 - 50 are common to both A and B, whereas
21 - 34 is specific only to A.
The desired output for me is :
10 \t 20 A,B
21 \t 34 A
35 \t 50 A,B
60 \t 70 A,B
80 \t 100 A,B
I just learned python frm a friend and he is also a
novice.
What I could get is the break up of chunks. A problem
here I am getting number different from what I need:
[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]
The list next to chunks is the pairs( the longer
ones).
could any one help me how can I correct [21, 35],[36,
50] to 21 \t 34 , 35 \t 50. I tried chaning the
indexs in function chunker, it is not working for me.
Also, how can I point chunks to their names.
This is the abstract example of the complex numbers
and their sequence names. I want to change the simple
code and then go to the complex one.
Thank you very much for your valuable time.
REsult: what I am getting now:
[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]
My code:
from sets import Set
dat = ['10\t50\tA', '10\t20\tB', '35\t50\tB',
'60\t70\tA', '60\t70\tB', '80\t100\tA', '80\t100\tB']
############
# creating a dictionary with coordiates as key and NM_
as value
#####
ekda = {}
for j in dat:
cols = j.split('\t')
ekda.setdefault(cols[0]+'\t'+cols[1],[]).append(cols[2])
######
#getting tab delim numbers only and not the A,B
bat = []
for j in dat:
cols = j.split('\t')
bat.append(cols[0]+'\t'+cols[1])
pairs = [ map(int, x.split('\t')) for x in bat ]
#####################################################################################
# this function takes pairs (from the above result)and
longer blocks(exons).
# For instance:
# 10 - 20; 14 - 25; 19 - 30; 40 - 50; 45 - 60; 70 - 80
# a =
[[10,20],[14,25],[19,30],[40,50],[45,60],[70,80]]
# for j in exoner(a):
# print j
#The result would be:
#10 - 30; 40 - 60; 70 - 80
#####################################################################################
def exoner(pairs):
pairs.sort()
i = iter(pairs)
last = i.next()
for current in i:
if current[0] in
xrange(last[0],last[1]):
if current[1] > last[1]:
last = [last[0],
current[1]]
else:
last =
[last[0],last[1]]
else:
yield last
last = current
yield last
lon = exoner(pairs)
#####################################################################################
## Here I am getting all the unique numbers in dat
nums = []
for j in pairs:
for k in j:
nums.append(k)
unm = Set(nums)
unums = []
for x in unm:
unums.append(x)
unums.sort()
#####################################################################################
### This function takes a list of numbers and breaks
it in pieces
## For instance [10,15,20,25,30]
#>>> i = [10,15,20,25,30]
#>>> chunker(i)
#[[10, 15], [16, 20], [21, 25], [26, 30]]
####
def chunker(lis):
res = []
res.append([lis[0],lis[1]])
for m in range(2,len(lis)):
res.append([lis[m-1]+1,lis[m]])
return res
####
# Here I take each pair (longer block) and roll over
all the unique numbers ((unums) from dat) and check if
that number is in#the range of pair, if so, I will
break all those set of number in pair range into small
blocks
######
gdic = {}
unums.sort()
for pair in exoner(pairs):
x = pair[0]
y = pair[1]+1
sml = []
for k in unums:
if k in range(x,y):
sml.append(k)
else:
pass
for j in chunker(sml):
print j,pair
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the Tutor
mailing list