[Numpy-discussion] histogramdd memory needs
Lars Friedrich
lfriedri at imtek.de
Fri Feb 1 04:57:37 EST 2008
Hello,
I use numpy.histogramdd to compute three dimensional histograms with a
total number of bins in the order of 1e7. It is clear to me, that such a
histogram will take a lot of memory. For a dtype=N.float64, it will take
roughly 80 megabytes. However, I have the feeling that during the
histogram calculation, much more memory is needed. For example, when I
have data.shape = (8e6, 3) and do a numpy.histogramdd(d, 280), I expect
a histogram size of (280**3)*8 = 176 megabytes, but during histogram
calculation the memory need of pythonw.exe in the Windows Task Manager
increases up to 687 megabytes over the level before histogram
calculation. When the calculation is done, the mem usage drops down to
the expected value. I assume this is due to the internal way,
numpy.histogramdd works. However, when I need to calculate even bigger
histograms, I cannot do it this way. So I have the following questions:
1) How can I tell histogramdd to use another dtype than float64? My bins
will be very little populated so an int16 should be sufficient. Without
normalization, a Integer dtype makes more sense to me.
2) Is there a way to use another algorithm (at the cost of performance)
that uses less memory during calculation so that I can generate bigger
histograms?
My numpy version is '1.0.4.dev3937'
Thanks,
Lars
--
Dipl.-Ing. Lars Friedrich
Photonic Measurement Technology
Department of Microsystems Engineering -- IMTEK
University of Freiburg
Georges-Köhler-Allee 102
D-79110 Freiburg
Germany
phone: +49-761-203-7531
fax: +49-761-203-7537
room: 01 088
email: lars.friedrich at imtek.de
More information about the NumPy-Discussion
mailing list