[XML-SIG] Developer's Day

Paul Prescod paul@prescod.net
Sun, 19 Dec 1999 15:03:45 -0600


Fredrik Lundh wrote:
> 
>...
> 
> "You told us to use Python for this million dollar
> system but halfways through its second day of
> operation, we realized that the production XML
> files were large enough to bring the server back-
> bone to its knees.  We now have several gigabytes
> sitting in the input queue, and no way to catch
> up.  The system simply isn't fast enough."

To be accurate, on my computer, xmlwf, sgmlop and "copy" are all about
the same speed. 

Obviously the limiting factor is my hard disk and has nothing to do with
overhead of XML processors. Therefore parsing performance should be the
least of our concerns. The real issue is the performance of the binding.
PyExpat seems to expect the whole document at once (or at least that's
what the SAX driver does):

      if not self.parser.Parse(fileobj.read(),1):
          self.__report_error()
  
Obviously this is going to be slow for large documents.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Three things see no end: A loop with exit code done wrong
A semaphore untested, and the change that comes along
http://www.geezjan.org/humor/computers/threes.html