Re: [lxml] from xml to sql

Jan. 25, 2012

      Piotr Oh, 24.01.2012 14:49:
...
I'm not a programmer, so please be patient I just need some scripting done
and my choice is python+lxml.
Excellent choice. :)
...
The problem to solve is.
1. xml file with some data is exported from our ERP system using third
party tools. It has xml schema.
2. It is intended to import to another system (sql/firebird)
3. Between the export/import process I'd like to validate the xml (using
xml schema)
That sounds like the easy part. What amount of data are you talking about?
Most importantly: does it fit into memory or not?

lxml's parser can validate during parsing, also for iterparse(), in case it
won't fit.
...
4. Than I need to import it to another system:
4a. check the values of the corresponding data in xml and SQL database,
compare them, do some action, write to log etc
4b. put them to SQL database, update (update, insert new)
I don't know how Firebird handles this, but you should try if

a) it has a direct way to import XML data in some way

b) you can get away with generating "INSERT OR UPDATE" statements.

In general, a large amount of database roundtrips will make your program
much slower than a direct import of a database dump, sometimes by orders of
magnitude. Generating a SQL dump file and letting the DB load that directly
is bound to be much faster.

However, if the diff is a real requirement, you may still have to find a
way to compare the data manually. What I did in a project once was to dump
the database content in SQL format, then line diff that with a dump I had
provided myself, after running both through Unix sort.

So, one approach would be to write a script that converts the XML data to
SQL statements that match those that your DB dumps itself, and then either
import them or dump the current DB content next to it and run a diff.
...
Validating is simple (point 3). then I need to traverse the xml, record by
record, do something with each and translate into sql query.
The most obvious approach to that is iterparse().
...
From this point of view IMHO what is the right way: use lxml.objectify or
etree?
Sadly, objectify still doesn't support iterparse() directly, but it should
be possible to install objectify's element lookup scheme as the default
lookup scheme and then run iterparse().

http://lxml.de/objectify.html#advanced-element-class-lookup

http://lxml.de/api/lxml.etree-module.html#set_element_class_lookup

http://lxml.de/element_classes.html#setting-up-a-class-lookup-scheme
...
I don't care about efficiency. Instead it should be as simple as possible
(to modify, read etc)
Both etree and objectify can be quite readable. If you go the "generate SQL
dump" road, etree may be simpler because objectify's auto data conversion
may get in the way when handling only strings, whereas if you take the
database roundtrips road, your code may turn out to be more concise with
objectify.

Apart from that, choose what you like better.

Stefan

Re: [lxml] from xml to sql

Stefan Behnel