How to search HUGE XML with DOM?

Fri Mar 31 06:55:47 EST 2006

Sullivan WxPyQtKinter wrote:
> a relation database has admiring search efficiency when the database is
> very big (several thousands or tens of thousands of records). But my
> current project is based on XML, for its tree-like data structure has
> much more flexibility; and DOM, which could be manipulated just like a
> tree. However, how to establish such a XML data base for search when it
> contains 10,000 records (One record usually contain 10~30 tags) or
> more?
> 
> My search needs:
> 1. Search and return all the record (an element) with specific id.
> 2. Search and return all the record whose child nodes has a specific id
> or attribute.
> 
> the xml.dom.minidom object is too slow when parsing such a big XML file
> to a DOM object. while pulldom should spend  quite a long time going
> through the whole database file. How to enhance the searching speed?
> Are there existing solution or algorithm? Thank you for your
> suggetion...

- have a look at cElementTree ?
- store your XML as persistant objects in a ZODB instance, then use ZODB
catalog for queries ?
- index relevant data in a DB (RDBMS, Berkeley, whatever...) ?
- have a look at 4suite (http://4suite.org/index.xhtml) ?

My 2 cents...
-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb at xiludom.gro'.split('@')])"