expat parser

Stefan Behnel stefan.behnel-n05pAM at web.de
Mon May 28 15:30:36 EDT 2007


Sebastian Bassi wrote:
> I have this code:
> 
> import xml.parsers.expat
> def start_element(name, attrs):
>    print 'Start element:', name, attrs
> def end_element(name):
>    print 'End element:', name
> def char_data(data):
>    print 'Character data:', repr(data)
> p = xml.parsers.expat.ParserCreate()
> p.StartElementHandler = start_element
> p.EndElementHandler = end_element
> p.CharacterDataHandler = char_data
> fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
> p.ParseFile(fh)
> 
> And I get this on the output:
> 
> ...
> Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length':
> u'393'}
> Character data: u'\n'
> Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRL'
> Character data: u'\n'
> Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH'
> ...
> End element: sequence
> ...
> 
> Is there a way to have the character data together in one string? I
> guess it should not be difficult, but I can't do it. Each time the
> parse reads a line, return a line, and I want to have it in one
> variable.

Any reason you are using expat and not cElementTree's iterparse?

Stefan



More information about the Python-list mailing list