fast method accessing large, simple structured data

John Machin sjmachin at
Sat Feb 2 22:50:47 CET 2008

agc wrote:
> Hi,
> I'm looking for a fast way of accessing some simple (structured) data.
> The data is like this:
> Approx 6 - 10 GB simple XML files with the only elements
> I really care about are the <title> and <article> ones.
> So what I'm hoping to do is put this data in a format so
> that I can access it as fast as possible for a given request
> (http request, Python web server) that specifies just the title,
> and I return the article content.
> Is there some good format that is optimized for search for
> just 1 attribute (title) and then returning the corresponding article?
> I've thought about putting this data in a SQLite database because
> from what I know SQLite has very fast reads (no network latency, etc)
> but not as fast writes, which is fine because I probably wont be doing
> much writing (I wont ever care about the speed of any writes).
> So is a database the way to go, or is there some other,
> more specialized format that would be better?

"Database" without any further qualification indicates exact matching, 
which doesn't seem to be very practical in the context of titles of 
articles. There is an enormous body of literature on inexact/fuzzy 
matching, and lots of deployed applications -- it's not a Python-related 
question, really.

More information about the Python-list mailing list