[XML-SIG] Some questions from a beginner

Derek Fountain derekfountain at yahoo.co.uk
Sat Feb 28 18:48:57 EST 2004


> I guess that having schema I can analyze the file. It has 3 kinds of the
> nodes (DB tuples). I found out that I can use dom or sax module. And as a
> file is about 100MB sax is the way to go. What should I do to understand
> data? What should I ask google for? The only way I found to analyze a large
> file is to write a doc handler specifying startElement/characters/…
> methods. I don’t believe that there is no way to use given logical
> structure (and types of element).
> The only thing I need to do (now) is to parse a file and move data to
> RDBMS. Please, at least some keywords to start with.

The truth is you already have your answer. I'll be interested to see if anyone 
else describes a different process, but as far as I am concerned, DOM and SAX 
are the alternatives to choose from. 100MB isn't that much to handle in DOM 
on a modern machine (my desktop has 1GB of RAM so can handle data several 
times that size without swapping), so DOM is a valid option. However, SAX is 
also valid, and perhaps applies better to your hardware or data.

So you do what you say. Write a handler with code to catch the elements and 
characters and deal with them as you wish. Remember that since you have valid 
XML (what are you using to validate against the schema?) your code can be 
quite simple. You know what sort of data is coming, and the exact order it's 
coming in. Your error handling can be minimal.

To use the "given logical structure" as you put it, you are better off using 
the DOM parser, then writing your code to walk around the data tree. But in 
your case - for transferring data to an RDBMS - you don't need to do that. 
Just parse over it using SAX and pick out each tuple as it goes past.

-- 
> eatapple
core dump




More information about the XML-SIG mailing list