Hi there, I am using this code https://github.com/xml-director/xmldirector.demo/blob/master/xmldirector/dem... for benchmarking the XSD parsing speed against some variants of the MODS schema. mods-3-1.xsd and mods-3-2.xsd take about 15 seconds for parsing while the other variants parser in less than 0.3 seconds. How can one explain this huge difference? Andreas ---- mods-3-1.xsd 0.00209999084473 15.2129580975 -------------------------------------------------------------------------------- mods-3-2.xsd 0.00260806083679 15.2835290432 -------------------------------------------------------------------------------- mods-3-3.xsd 0.00289702415466 0.300955057144 -------------------------------------------------------------------------------- mods-3-4.xsd 0.00385713577271 0.313620090485 -------------------------------------------------------------------------------- mods-3-5.xsd 0.00278782844543 0.278451919556
Andreas Jung schrieb am 11.04.2015 um 04:01:
I am using this code
https://github.com/xml-director/xmldirector.demo/blob/master/xmldirector/dem...
for benchmarking the XSD parsing speed against some variants of the MODS schema.
Quick remark: "XML(f.read())" is very inefficient. But that's not the problem here, as we can see from your numbers.
mods-3-1.xsd and mods-3-2.xsd take about 15 seconds for parsing while the other variants parser in less than 0.3 seconds.
I assume the "while True" loops means that you're repeating the whole benchmark multiple times to see if it's a setup problem or if it persists over time? And I assume it persists?
How can one explain this huge difference?
I'd run a profiler. Given that lxml doesn't really do anything here, Python level profiling won't help, so (assuming you're on Linux) I recommend callgrind and kcachegrind. They're really easy to use once you've found a suitable command line for callgrind, e.g. valgrind --tool=callgrind --toggle-collect=xmlSchemaParse \ python my_parse_schema.py mods-3-2.xsd See http://www.valgrind.org/docs/manual/cl-manual.html That should generate a file callgrind.out.PID and tell you how many instructions it collected. Then run KCachegrind on that file. Click around to see where the problem is. The call tree will guide you there. If you're using the system provided libxml2, then the output might be a little obfuscated. There should be debug packages of those libraries available for your system that fix this. Stefan
Hi, The xsd:Import <xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/> may be slow? 3-3 uses another server to get xml.xsd. jens
Am 11.04.2015 um 04:01 schrieb Andreas Jung <lists@zopyx.com>:
Hi there,
I am using this code
https://github.com/xml-director/xmldirector.demo/blob/master/xmldirector/dem...
for benchmarking the XSD parsing speed against some variants of the MODS schema.
mods-3-1.xsd and mods-3-2.xsd take about 15 seconds for parsing while the other variants parser in less than 0.3 seconds.
How can one explain this huge difference?
Andreas
----
mods-3-1.xsd
0.00209999084473
15.2129580975
--------------------------------------------------------------------------------
mods-3-2.xsd
0.00260806083679
15.2835290432
--------------------------------------------------------------------------------
mods-3-3.xsd
0.00289702415466
0.300955057144
--------------------------------------------------------------------------------
mods-3-4.xsd
0.00385713577271
0.313620090485
--------------------------------------------------------------------------------
mods-3-5.xsd
0.00278782844543
0.278451919556
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Jens Quade schrieb am 11.04.2015 um 09:15:
The xsd:Import <xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/> may be slow? 3-3 uses another server to get xml.xsd.
Ah, right. I think I recall that the W3C has deliberately slowed down their DTD/schema delivery when they noticed that people just happily load and reload stuff from their servers, even in CI or benchmark scenarios. The right way to do it is to provide all external dependencies in your local catalogue, then libxml2 will load them from there. Stefan
Thanks, the W3C throttle was really in charge for this issue...hard to detect if you don't know it. Andreas 2015-04-11 3:19 GMT-04:00 Stefan Behnel <stefan_ml@behnel.de>:
Jens Quade schrieb am 11.04.2015 um 09:15:
The xsd:Import <xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/> may be slow? 3-3 uses another server to get xml.xsd.
Ah, right. I think I recall that the W3C has deliberately slowed down their DTD/schema delivery when they noticed that people just happily load and reload stuff from their servers, even in CI or benchmark scenarios.
The right way to do it is to provide all external dependencies in your local catalogue, then libxml2 will load them from there.
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
participants (3)
-
Andreas Jung -
Jens Quade -
Stefan Behnel