fast method accessing large, simple structured data
John Machin
sjmachin at lexicon.net
Sat Feb 2 16:50:47 EST 2008
agc wrote:
> Hi,
>
> I'm looking for a fast way of accessing some simple (structured) data.
>
> The data is like this:
> Approx 6 - 10 GB simple XML files with the only elements
> I really care about are the <title> and <article> ones.
>
> So what I'm hoping to do is put this data in a format so
> that I can access it as fast as possible for a given request
> (http request, Python web server) that specifies just the title,
> and I return the article content.
>
> Is there some good format that is optimized for search for
> just 1 attribute (title) and then returning the corresponding article?
>
> I've thought about putting this data in a SQLite database because
> from what I know SQLite has very fast reads (no network latency, etc)
> but not as fast writes, which is fine because I probably wont be doing
> much writing (I wont ever care about the speed of any writes).
>
> So is a database the way to go, or is there some other,
> more specialized format that would be better?
>
"Database" without any further qualification indicates exact matching,
which doesn't seem to be very practical in the context of titles of
articles. There is an enormous body of literature on inexact/fuzzy
matching, and lots of deployed applications -- it's not a Python-related
question, really.
More information about the Python-list
mailing list