[Tutor] how to do systematic searching in dictionary and printing it
Kent Johnson
kent37 at tds.net
Thu Oct 20 19:56:24 CEST 2005
I would do this by making a dictionary mapping sequence to header for each data set. Then make a set that contains the keys common to both data sets. Finally use the dictionaries again to look up the headers.
a = '''>a1
TTAATTGGAACA
>a2
AGGACAAGGATA
>a3
TTAAGGAACAAA'''.split()
# Make a dict mapping sequence to header for the 'a' data set
ak = a[1::2]
av = a[::2]
a_dict = dict(zip(ak,av))
print a_dict
b = '''>b1
TTAATTGGAACA
>b2
AGGTCAAGGATA
>b3
AAGGCCAATTAA'''.split()
# Make a dict mapping sequence to header for the 'b' data set
bk = b[1::2]
bv = b[::2]
b_dict = dict(zip(bk,bv))
print b_dict
# Make a set that contains the keys common to both dicts
common_keys = set(a_dict.iterkeys())
common_keys.intersection_update(b_dict.iterkeys())
print common_keys
# For each common key, print the corresponding headers
for common in common_keys:
print '%s\t%s' % (a_dict[common], b_dict[common])
Kent
Srinivas Iyyer wrote:
> dear group,
>
>
> I have two files in a text format and look this way:
>
>
> File a1.txt:
>
>>a1
>
> TTAATTGGAACA
>
>>a2
>
> AGGACAAGGATA
>
>>a3
>
> TTAAGGAACAAA
>
>
>
> File b1.txt:
>
>>b1
>
> TTAATTGGAACA
>
>>b2
>
> AGGTCAAGGATA
>
>>b3
>
> AAGGCCAATTAA
>
>
> I want to check if there are common elements based on
> ATGC sequences. a1 and b1 are identical sequences and
> I want to select them and print the headers (starting
> with > symbol).
>
> a1 '\t' b1
>
>
>
> Here:
>
>>XXXXX is called header and the line followed by >line
>
> is sequence. In bioinformatics, this is called a FASTA
> format. What I am doing here is, I am matching the
> sequences (these are always 25 mers in this instance)
> and if they match, I am asking python to write the
> header +'\t'+ header
>
>
> ak = a[1::2]
> av = a[::2]
> seq_dict = dict(zip(ak,av))
>
> **************************************
>
>>>>seq_dict
>
> {'TTAAGGAACAAA': '>a3', 'AGGACAAGGATA': '>a2',
> 'TTAATTGGAACA': '>a1'}
> **************************************
>
>
>
> bv = b[1::2]
>
> ***************************************
>
>>>>bv
>
> ['TTAATTGGAACA', 'AGGTCAAGGATA', 'AAGGCCAATTAA']
>
>
>
>>>>for i in bv:
>
> if seq_dict.has_key(i):
> print seq_dict[i]
>
>
>
>>a1
>
>
> ***************************************
>
> Here a1 is the only common element.
>
> However, I am having difficulty printing that b1 is
> identical to a1
>
>
> how do i take b and do this search. It was easy for me
> to take the sequence part by doing
>
> b[1::2]. however, I want to print b1 header has same
> sequence as a1
>
> a1 +'\t'+b1
>
> Is there anyway i can do this. This is very simple and
> due to my brain block, I am unable to get it out.
> Can any one please help me out.
>
> Thanks
>
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list