optimizing large dictionaries
jervisau at gmail.com
Thu Jan 15 23:18:51 CET 2009
On Fri, Jan 16, 2009 at 8:39 AM, Per Freem <perfreem at yahoo.com> wrote:
> i have an optimization questions about python. i am iterating through
> a file and counting the number of repeated elements. the file has on
> the order
> of tens of millions elements...
> for line in file:
> elt = MyClass(line)# extract elt from line...
> my_dict[elt] += 1
> except KeyError:
> my_dict[elt] = 1
> class MyClass
> def __str__(self):
> return "%s-%s-%s" %(self.field1, self.field2, self.field3)
> def __repr__(self):
> return str(self)
> def __hash__(self):
> return hash(str(self))
> is there anything that can be done to speed up this simply code? right
> now it is taking well over 15 minutes to process, on a 3 Ghz machine
> with lots of RAM (though this is all taking CPU power, not RAM at this
> any general advice on how to optimize large dicts would be great too
> thanks for your help.
You can get a large speedup by removing the need to instantiate a new
MyClass instance on
each iteration of your loop.
Instead define one MyClass with an 'interpret' method that would be called
instead of MyClass()
interpret would return the string '%s-%s-%s' % (self.field1 etc..)
myclass = MyClass()
interpret = myclass.interpret
for line in file:
elt = interpet(line)# extract elt from line...
my_dict[elt] += 1
my_dict[elt] = 1
The speed up is on the order of 10 on my machine.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list