[Tutor] Parsing a block of XML text

kumar s ps_python at yahoo.com
Fri Dec 31 23:15:42 CET 2004


Dear group:

I am trying to parse BLAST output (Basic Local
Alignment Search Tool, size around more than 250 KB 
).

- <Hit>
  <Hit_num>1</Hit_num> 
  <Hit_id>gi|43442325|emb|BX956931.1|</Hit_id> 
  <Hit_def>DKFZp781D1095_r1 781 (synonym: hlcc4) Homo
sapiens cDNA clone DKFZp781D1095 5', mRNA
sequence.</Hit_def> 
  <Hit_accession>BX956931</Hit_accession> 
  <Hit_len>693</Hit_len> 
- <Hit_hsps>
- <Hsp>
  <Hsp_num>1</Hsp_num> 
  <Hsp_bit-score>1164.13</Hsp_bit-score> 
  <Hsp_score>587</Hsp_score> 
  <Hsp_evalue>0</Hsp_evalue> 
  <Hsp_query-from>1</Hsp_query-from> 
  <Hsp_query-to>587</Hsp_query-to> 
  <Hsp_hit-from>107</Hsp_hit-from> 
  <Hsp_hit-to>693</Hsp_hit-to> 
  <Hsp_query-frame>1</Hsp_query-frame> 
  <Hsp_hit-frame>1</Hsp_hit-frame> 
  <Hsp_identity>587</Hsp_identity> 
  <Hsp_positive>587</Hsp_positive> 
  <Hsp_align-len>587</Hsp_align-len> 
 
<Hsp_qseq>GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGTTTTCAGTTTTGTTTCTGGTTGTTTGGTTAGGGCTGAATGTTTTCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGAAAAATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAATTTTAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGAGGGGCACCTGCTCATTTTGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAAAAAGGGGGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA</Hsp_qseq>

 
<Hsp_hseq>GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGTTTTCAGTTTTGTTTCTGGTTGTTTGGTTAGGGCTGAATGTTTTCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGAAAAATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAATTTTAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGAGGGGCACCTGCTCATTTTGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAAAAAGGGGGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA</Hsp_hseq>

 
<Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>

  </Hsp>
  </Hit_hsps>
  </Hit>
- <Hit>




I wanted to parse out :

<Hsp_query-from> <Hsp_query-out)
 <Hsp_hit-from></Hsp_hit-from> 
  <Hsp_hit-to></Hsp_hit-to> 


I wrote a ver small 4 line code to obtain it.

for bls in doc.getElementsByTagName('Hsp_num'):
	bls.normalize()
	if bls.firstChild.data >1:
		print bls.firstChild.data


This is not sufficient for me to get anything doen. 
Could any one help me directing how to get the
elements
in that tag. 

Thanks.
-K


		
__________________________________ 
Do you Yahoo!? 
Send holiday email and support a worthy cause. Do good. 
http://celebrity.mail.yahoo.com


More information about the Tutor mailing list