xpathEval fails for large files
Jim Washington
jwashin at vt.edu
Tue Jul 22 07:48:01 EDT 2008
Kanchana wrote:
> Hi,
>
> I tried to extract some data with xpathEval. Path contain more than
> 100,000 elements.
>
> doc = libxml2.parseFile("test.xml")
> ctxt = doc.xpathNewContext()
> result = ctxt.xpathEval('//src_ref/@editions')
> doc.freeDoc()
> ctxt.xpathFreeContext()
>
> this will stuck in following line and will result in high usage of
> CPU.
> result = ctxt.xpathEval('//src_ref/@editions')
>
> Any suggestions to resolve this.
>
> Is there any better alternative to handle large documents?
One option might be an XML database. I'm familiar with Sedna (
http://modis.ispras.ru/sedna/ ).
In practice, you store the document in the database, and let the
database do the extracting for you. Sedna does XQuery, which is a very
nice way to get just what you want out of your document or collection of
documents.
Good:
It's free (Apache 2.0 license)
It's cross-platform (later Windows x86, Linux x86, FreeBSD, MacOS X)
It has python bindings (zif.sedna at the cheese shop and others).
It's pretty fast, particularly if you set-up indexes.
Document and document collection size are limited only by disk space.
Not so good:
Sedna runs as a server. Expect to use in the range of 100M of RAM
per database. A database can contain many many documents, so you
probably only want one database, anyway.
Disclosure: I'm the author of the zif.sedna package, and I'm
interpreting the fact that I have not received much feedback as "It
works pretty well" :)
- Jim Washington
More information about the Python-list
mailing list