matching strings in a large set of strings
paul.nospam at rudin.co.uk
Fri Apr 30 10:50:42 CEST 2010
Duncan Booth <duncan.booth at invalid.invalid> writes:
> Paul Rudin <paul.nospam at rudin.co.uk> wrote:
>> Shouldn't a set with 83 million 14 character strings be fine in memory
>> on a stock PC these days? I suppose if it's low on ram you might start
>> swapping which will kill performance. Perhaps the method you're using
>> to build the data structures creates lots of garbage? How much ram do
>> you have and how much memory does the python process use as it builds
>> your data structures?
> Some simple experiments should show you that a stock PC running a 32 bit
> Python will struggle:
>>>> s = "12345678901234"
> So more than 3GB just for the strings (and that's for Python 2.x on
> Python 3.x you'll need nearly 5GB).
> Running on a 64 bit version of Python should be fine, but for a 32 bit
> system a naive approach just isn't going to work.
It depends - a few gig of RAM can be cheap compared with programmer
time. If you know you can solve a problem by spending a few euros on
some extra RAM it can be a good solution! It depends of course where the
code is being deployed - if it's something that's intended to be
deployed widely then you can't expect thousands of people to go out and
buy more RAM - but if it's a one off deployment for a particular
environment then it can be the best way to go.
More information about the Python-list