[Tutor] Using Beautiful Soup to extract tag names

Kent Johnson kent37 at tds.net
Tue Mar 14 16:38:46 CET 2006


Ed Singleton wrote:
> I have (unfortunately) received some data in XML format.  I need to
> use it in Python, preferably as a list of dictionaries.  The data is a
> flat representation of a table, in the style:
> 
> <tablename>
> <fieldname1>Some Data</fieldname1>
> <fieldname2>Some Data</fieldname>
> ...
> </tablename>
> <tablename>
> <fieldname1>Some Data</fieldname1>
> <fieldname2>Some Data</fieldname>
> ...
> 
> and so on (where tablename is always the same in one file).

ElementTree makes short work of this:

from elementtree import ElementTree

xml = '''
<data><tablename>
<fieldname1>Some Data1</fieldname1>
<fieldname2>Some Data2</fieldname2>
</tablename>
<tablename>
<fieldname3>Some Data3</fieldname3>
<fieldname4>Some Data4</fieldname4>
</tablename>
</data>'''

doc = ElementTree.fromstring(xml)
# use ElementTree.parse() to parse a file

for table in doc.findall('tablename'):
     for field in table.getchildren():
         print field.tag, field.text


prints:
fieldname1 Some Data1
fieldname2 Some Data2
fieldname3 Some Data3
fieldname4 Some Data4

If speed is an issue then look at cElementTree which has the same 
interface and is blazingly fast.
http://effbot.org/zone/element.htm

Kent



More information about the Tutor mailing list