Populating a dictionary, fast [SOLVED]
Paul McGuire
ptmcg at austin.rr.com
Tue Nov 13 12:09:32 EST 2007
On Nov 12, 11:32 am, "Michael Bacarella" <m... at gpshopper.com> wrote:
> See end for solution.
>
> > >> (3) Are you sure you need all eight-million-plus items in the cache
> > >> all at once?
>
> > > Yes.
>
> > I remain skeptical, but what do I know, I don't even know what you're
> > doing with the data once you have it :-)
>
> It's OK, I'd be skeptical too. ;)
>
>
>
>
>
> > $ cat /proc/cpuinfo
> > processor : 0
> > vendor_id : AuthenticAMD
> > cpu family : 15
> > model : 107
> > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
> > stepping : 1
> > cpu MHz : 1000.000
> > cache size : 512 KB
> > ...
> > processor : 1
> > vendor_id : AuthenticAMD
> > cpu family : 15
> > model : 107
> > model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
> > stepping : 1
> > cpu MHz : 1000.000
> > cache size : 512 KB
>
> # cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 5
> model name : AMD Opteron(tm) Processor 246
> stepping : 10
> cpu MHz : 2009.305
> cache size : 1024 KB
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
> bogomips : 3981.31
> TLB size : 1088 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp
>
> processor : 1
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 5
> model name : AMD Opteron(tm) Processor 246
> stepping : 10
> cpu MHz : 2009.305
> cache size : 1024 KB
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow
> bogomips : 4014.08
> TLB size : 1088 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp
>
> As for the solution, after trying a half-dozen different integer hashing
> functions
> and hash table sizes (the brute force approach), on a total whim I switched
> to a
> model with two dictionary tiers and got whole orders of magnitude
> better performance.
>
> The tiering is, for a given key of type long:
>
> id2name[key >> 40][key & 0x10000000000] = name
>
> Much, much better. A few minutes versus hours this way.
>
> I suspect it could be brought down to seconds with a third level of
> tiers but this is no longer posing the biggest bottleneck... ;)
>
> Thanks!- Hide quoted text -
>
> - Show quoted text -
Shouldn't this be:
id2name[key >> 40][key & 0xffffffffff] = name
?
-- Paul
More information about the Python-list
mailing list