[XML-SIG] Some questions from a beginner
derekfountain at yahoo.co.uk
Sat Feb 28 18:48:57 EST 2004
> I guess that having schema I can analyze the file. It has 3 kinds of the
> nodes (DB tuples). I found out that I can use dom or sax module. And as a
> file is about 100MB sax is the way to go. What should I do to understand
> data? What should I ask google for? The only way I found to analyze a large
> file is to write a doc handler specifying startElement/characters/…
> methods. I don’t believe that there is no way to use given logical
> structure (and types of element).
> The only thing I need to do (now) is to parse a file and move data to
> RDBMS. Please, at least some keywords to start with.
The truth is you already have your answer. I'll be interested to see if anyone
else describes a different process, but as far as I am concerned, DOM and SAX
are the alternatives to choose from. 100MB isn't that much to handle in DOM
on a modern machine (my desktop has 1GB of RAM so can handle data several
times that size without swapping), so DOM is a valid option. However, SAX is
also valid, and perhaps applies better to your hardware or data.
So you do what you say. Write a handler with code to catch the elements and
characters and deal with them as you wish. Remember that since you have valid
XML (what are you using to validate against the schema?) your code can be
quite simple. You know what sort of data is coming, and the exact order it's
coming in. Your error handling can be minimal.
To use the "given logical structure" as you put it, you are better off using
the DOM parser, then writing your code to walk around the data tree. But in
your case - for transferring data to an RDBMS - you don't need to do that.
Just parse over it using SAX and pick out each tuple as it goes past.
More information about the XML-SIG