[Tutor] Selecting text
Karl Pflästerer
sigurd at 12move.de
Wed Jan 19 14:24:43 CET 2005
On 19 Jan 2005, ps_python at yahoo.com wrote:
> I have two lists:
>
> 1. Lseq:
>
>>>> len(Lseq)
> 30673
>>>> Lseq[20:25]
> ['NM_025164', 'NM_025164', 'NM_012384', 'NM_006380',
> 'NM_007032','NM_014332']
>
>
> 2. refseq:
>>>> len(refseq)
> 1080945
>>>> refseq[0:25]
> ['>gi|10047089|ref|NM_014332.1| Homo sapiens small
> muscle protein, X-linked (SMPX), mRNA',
> 'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGAAAAGCATCGGAATTGAGATCGCAGCT',
> 'CAGAGGACACCGGGCGCCCCTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT',
[...]
> 'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTAAAATGTGA',
> '>gi|10047091|ref|NM_013259.1| Homo sapiens neuronal
> protein (NP25), mRNA',
[...]
> If Lseq[i] is present in refseq[k], then I am
> interested in printing starting from refseq[k] until
> the element that starts with '>' sign.
>
> my Lseq has NM_014332 element and this is also present
> in second list refseq. I want to print starting from
> element where NM_014332 is present until next element
> that starts with '>' sign.
> I could not think of any smart way to do this,
> although I have tried like this:
I give you the same answer I think you got the last times you asked such
a question: use a dictionary if you want to search items.
So how to do it?
You could build a dictionary from refseq where the elements that can
match the elemenst from Lseq are the keys.
Then you iterate over Lseq, look if you find a key in your dictionary
and if yes print the matching elemnt from the list.
The next function creates a dictionary. The keys are the
NM_... entries the values are the start and end indice of the
corresponding entries.
def build_dic (seq):
keys = []
indice = []
for ind, entry in enumerate(seq):
if entry.startswith('>'):
key = entry.split('|')[3]
keys.append(key)
indice.append(ind)
indice.append(-1)
return dict(zip(keys, zip(indice, indice[1:])))
With that function you search for matching keys and if a match is found
use the start and end index to extract the right elements from the list.
def find_matching (rseq, lseq):
d = build_dic(rseq)
for key in lseq:
if key in d:
start, end = d[key]
print rseq[start:end]
Karl
--
Please do *not* send copies of replies to me.
I read the list
More information about the Tutor
mailing list