[Numpy-discussion] histogramdd memory needs

Lars Friedrich lfriedri at imtek.de
Fri Feb 1 04:57:37 EST 2008


Hello,

I use numpy.histogramdd to compute three dimensional histograms with a 
total number of bins in the order of 1e7. It is clear to me, that such a 
histogram will take a lot of memory. For a dtype=N.float64, it will take 
roughly 80 megabytes. However, I have the feeling that during the 
histogram calculation, much more memory is needed. For example, when I 
have data.shape = (8e6, 3) and do a numpy.histogramdd(d, 280), I expect 
a histogram size of (280**3)*8 = 176 megabytes, but during histogram 
calculation the memory need of pythonw.exe in the Windows Task Manager 
increases up to 687 megabytes over the level before histogram 
calculation. When the calculation is done, the mem usage drops down to 
the expected value. I assume this is due to the internal way, 
numpy.histogramdd works. However, when I need to calculate even bigger 
histograms, I cannot do it this way. So I have the following questions:

1) How can I tell histogramdd to use another dtype than float64? My bins 
will be very little populated so an int16 should be sufficient. Without 
normalization, a Integer dtype makes more sense to me.

2) Is there a way to use another algorithm (at the cost of performance) 
that uses less memory during calculation so that I can generate bigger 
histograms?

My numpy version is '1.0.4.dev3937'

Thanks,
Lars


-- 
Dipl.-Ing. Lars Friedrich

Photonic Measurement Technology
Department of Microsystems Engineering -- IMTEK
University of Freiburg
Georges-Köhler-Allee 102
D-79110 Freiburg
Germany

phone: +49-761-203-7531
fax:   +49-761-203-7537
room:  01 088
email: lars.friedrich at imtek.de



More information about the NumPy-Discussion mailing list