Best way to handle large lists?
durumdara at gmail.com
Tue Oct 3 13:56:36 CEST 2006
Chaz Ginger írta:
> I have a system that has a few lists that are very large (thousands or
> tens of thousands of entries) and some that are rather small. Many times
> I have to produce the difference between a large list and a small one,
> without destroying the integrity of either list. I was wondering if
> anyone has any recommendations on how to do this and keep performance
> high? Is there a better way than
> [ i for i in bigList if i not in smallList ]
If you have big list, you can use dbm like databases.
They are very quick. BSDDB, flashdb, etc. See SleepyCat, or see python help.
In is very slow in large datasets, but bsddb is use hash values, so it
is very quick.
The SleepyCat database have many extras, you can set the cache size and
many other parameters.
Or if you don't like dbm style databases, you can use SQLite. Also
quick, you can use SQL commands.
A little slower than bsddb, but it is like SQL server. You can improve
the speed with special parameters.
More information about the Python-list