[Tutor] Parsing a block of XML text
kumar s
ps_python at yahoo.com
Fri Dec 31 23:15:42 CET 2004
Dear group:
I am trying to parse BLAST output (Basic Local
Alignment Search Tool, size around more than 250 KB
).
- <Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|43442325|emb|BX956931.1|</Hit_id>
<Hit_def>DKFZp781D1095_r1 781 (synonym: hlcc4) Homo
sapiens cDNA clone DKFZp781D1095 5', mRNA
sequence.</Hit_def>
<Hit_accession>BX956931</Hit_accession>
<Hit_len>693</Hit_len>
- <Hit_hsps>
- <Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>1164.13</Hsp_bit-score>
<Hsp_score>587</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>587</Hsp_query-to>
<Hsp_hit-from>107</Hsp_hit-from>
<Hsp_hit-to>693</Hsp_hit-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>587</Hsp_identity>
<Hsp_positive>587</Hsp_positive>
<Hsp_align-len>587</Hsp_align-len>
<Hsp_qseq>GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGTTTTCAGTTTTGTTTCTGGTTGTTTGGTTAGGGCTGAATGTTTTCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGAAAAATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAATTTTAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGAGGGGCACCTGCTCATTTTGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAAAAAGGGGGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA</Hsp_qseq>
<Hsp_hseq>GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGTTTTCAGTTTTGTTTCTGGTTGTTTGGTTAGGGCTGAATGTTTTCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGAAAAATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAATTTTAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGAGGGGCACCTGCTCATTTTGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAAAAAGGGGGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA</Hsp_hseq>
<Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
- <Hit>
I wanted to parse out :
<Hsp_query-from> <Hsp_query-out)
<Hsp_hit-from></Hsp_hit-from>
<Hsp_hit-to></Hsp_hit-to>
I wrote a ver small 4 line code to obtain it.
for bls in doc.getElementsByTagName('Hsp_num'):
bls.normalize()
if bls.firstChild.data >1:
print bls.firstChild.data
This is not sufficient for me to get anything doen.
Could any one help me directing how to get the
elements
in that tag.
Thanks.
-K
__________________________________
Do you Yahoo!?
Send holiday email and support a worthy cause. Do good.
http://celebrity.mail.yahoo.com
More information about the Tutor
mailing list