[XML-SIG] (DOM) Reading of large (huge) document

Horst Eyermann Horst Eyermann <horst@freedict.de>
Sat, 18 Aug 2001 18:20:51 +0200 (CEST)


Hello,
I try to read in a dictionary, formatted in XML into a database. Each
dictionary entry should be a database entry:

self.reader = PyExpatReader()
self.doc = self.reader.fromUri(fileName)
Util.IndexDocument(self.doc)
sefl.result = xpath.Evaluate('//entry', contextNode=self.doc)


This does work well for small dictionaries, but for a large example (100000
words, 20MBytes), I do have problems doing this. After a while all my main
memory (380MByte) and swap (400MByte) are used, and the system is busy paging.

Could anyone give advice if there is a more efficient way to read in the
document?

Should I do the parsing for <entry>...</entry> by hand, and just process the
contense (validating, ...)? - It would appear, that this is by far less
demanding, but is this the right way to go?


Horst

  
Horst@freedict.de
Horst Eyermann 
Germany

You need a dictionary? - visit http://www.freedict.de
for free (GPL) dictionaries (unix; windows work in progress)
For windows, visit http://www.freedict.de/wbuch

A article (in German) about dictionary efforts on the net
http://www.heise.de/tp/deutsch/inhalt/on/5927/1.html