dict vs kjBuckets vs ???
Gordon McMillan
gmcm at hypernet.com
Sat Jun 12 00:23:53 EDT 1999
Mark R. This:
> >> I plan to write a program that would store lots (in range of 10M or even
> >> more) of relatively small objects (a few hundred bytes at most), so what
> >> do you think I should use?
[Tim]
> >Let's do a little math <wink>: 10M * 100 = ?, a lower bound on what you're
> >contemplating. Do you have gigabytes of RAM?
[Brian, er, Mark]
> I'm opening a boutique.
[Tim]
> >...Memory-based data structures aren't
> >going to work for the size of thing you have in mind. If you can make it
> >fly it all, you'll likely require a powerful database, so of those choices
> >Metakit is the only approach that's not dead on arrival.
[Mark]
> A few additional informations: items stored would be natural
> language text fragments (several sentences at most, several words
> typically)
> + binary descriptions, primary operation would be lots of searching.
> Is there anything else that would be better for this kind of
> program? Object database?
Searching 10M items is going to strain just about anything. I get
wonderful response times out of MetaKit, but I've never tried
anything approaching that size. The state of the art for searching
big amounts of data are the big boys of SQL databases. But even
there you'll have an index of 6 or 10 levels, and unless you have
huge amounts of RAM, only a few of them will be in memory. So the
disk will grind and grind for each search. I'd try MetaKit first,
since it's a whole lot simpler and lighter wieght.
The best solution would be to exploit any interrelationships in your
items. If you could reduce it to 100K items, you could burn a whole
lot of CPU and still come out ahead.
and-good-luck-with-the-boutique-Brian-ly y'rs
- Gordon
More information about the Python-list
mailing list