Running queries on large data structure

H J van Rooyen mail at microcorp.co.za
Fri Aug 4 01:47:42 EDT 2006


"Christoph Haas" <email at christoph-haas.de> wrote"


| On Wednesday 02 August 2006 22:24, Christoph Haas wrote:
| > I have written an application in Perl some time ago (I was young and
| > needed the money) that parses multiple large text files containing
| > nested data structures and allows the user to run quick queries on the
| > data. [...]
|
| I suppose my former posting was too long and concrete. So allow me to try
| it in a different way. :)
|
| The situation is that I have input data that take ~1 minute to parse while
| the users need to run queries on that within seconds. I can think of two
| ways:
|
| (1) Database
|     (very quick, but the input data is deeply nested and it would be
|      ugly to convert it into some relational shape for the database)
| (2) cPickle
|     (Read the data every now and then, parse it, write the nested Python
|      data structure into a pickled file. The let the other application
|      that does the queries unpickle the variable and use it time and
|      again.)
|
| So the question is: would you rather force the data into a relational
| database and write object-relational wrappers around it? Or would you
| pickle it and load it later and work on the data? The latter application
| is currently a CGI. I'm open to whatever. :)
|
| Thanks for any enlightenment.
|
|  Christoph

Not sure if this is of any use - but I have noticed that dict lookups in Python
is blindingly fast - but it seems to me to use them would be as much trouble to
you as converting the data into a shape for the database in third normal form...

Unless of course your parser is already doing something like that...

There are also some fancy packages for tree structures around, but I don't know
anything useful about them.

- Hendrik





More information about the Python-list mailing list