Caching float(0.0)
I just discovered that in a program of mine it was wasting 7MB out of 200MB by storing multiple copies of 0.0. I found this a bit suprising since I'm used to small ints and strings being cached. I added the apparently nonsensical lines + if age == 0.0: + age = 0.0 # return a common object for the common case and got 7MB of memory back! Eg :- Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29) [GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
a=0.0 print id(a), id(0.0) 134738828 134738844
Is there any reason why float() shouldn't cache the value of 0.0 since
it is by far and away the most common value?
A full cache of floats probably doesn't make much sense though since
there are so many 'more' of them than integers and defining small
isn't obvious.
--
Nick Craig-Wood
Nick Craig-Wood wrote:
Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value?
says who ? (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.) </F>
On 9/29/06, Fredrik Lundh
(I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.)
*bemused look* Fredrik, can you share the reason why this number occurs 32 times in this program? I don't mean to imply anything by that; it just sounds like it might be a fun story. :) Anyway, this kind of static analysis is probably more entertaining than relevant. For your enjoyment, the most-used float literals in python25\Lib, omitting test directories, are: 1e-006: 5 hits 4.0: 6 hits 0.05: 7 hits 6.0: 8 hits 0.5: 13 hits 2.0: 25 hits 0.0: 36 hits 1.0: 62 hits There are two hits each for -1.0 and -0.5. In my own Python code, I don't even have enough float literals to bother with. -j
Jason Orendorff wrote:
On 9/29/06, Fredrik Lundh
wrote: (I just checked the program I'm working on, and my analysis tells me that the most common floating point value in that program is 121.216, which occurs 32 times. from what I can tell, 0.0 isn't used at all.)
*bemused look* Fredrik, can you share the reason why this number occurs 32 times in this program? I don't mean to imply anything by that; it just sounds like it might be a fun story. :)
Anyway, this kind of static analysis is probably more entertaining than relevant. For your enjoyment, the most-used float literals in python25\Lib, omitting test directories, are:
1e-006: 5 hits 4.0: 6 hits 0.05: 7 hits 6.0: 8 hits 0.5: 13 hits 2.0: 25 hits 0.0: 36 hits 1.0: 62 hits
There are two hits each for -1.0 and -0.5.
In my own Python code, I don't even have enough float literals to bother with.
By these statistics I think the answer to the original question is clearly "no" in the general case. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden
Steve> By these statistics I think the answer to the original question Steve> is clearly "no" in the general case. As someone else (Guido?) pointed out, the literal case isn't all that interesting. I modified floatobject.c to track a few interesting floating point values: static unsigned int nfloats[5] = { 0, /* -1.0 */ 0, /* 0.0 */ 0, /* +1.0 */ 0, /* everything else */ 0, /* whole numbers from -10.0 ... 10.0 */ }; PyObject * PyFloat_FromDouble(double fval) { register PyFloatObject *op; if (free_list == NULL) { if ((free_list = fill_free_list()) == NULL) return NULL; } if (fval == 0.0) nfloats[1]++; else if (fval == 1.0) nfloats[2]++; else if (fval == -1.0) nfloats[0]++; else nfloats[3]++; if (fval >= -10.0 && fval <= 10.0 && (int)fval == fval) { nfloats[4]++; } /* Inline PyObject_New */ op = free_list; free_list = (PyFloatObject *)op->ob_type; PyObject_INIT(op, &PyFloat_Type); op->ob_fval = fval; return (PyObject *) op; } static void _count_float_allocations(void) { fprintf(stderr, "-1.0: %d\n", nfloats[0]); fprintf(stderr, " 0.0: %d\n", nfloats[1]); fprintf(stderr, "+1.0: %d\n", nfloats[2]); fprintf(stderr, "rest: %d\n", nfloats[3]); fprintf(stderr, "whole numbers -10.0 to 10.0: %d\n", nfloats[4]); } then called atexit(_count_float_allocations) in _PyFloat_Init and ran "make test". The output was: ... ./python.exe -E -tt ../Lib/test/regrtest.py -l ... -1.0: 29048 0.0: 524241 +1.0: 91561 rest: 1749807 whole numbers -10.0 to 10.0: 1151442 So for a largely non-floating point "application", a fair number of floats are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50% of them are whole numbers between -10.0 and 10.0, inclusive. Seems like it at least deserves a serious look. It would be nice to have the numeric crowd contribute to this subject as well. Skip
skip@pobox.com wrote:\/
Steve> By these statistics I think the answer to the original question Steve> is clearly "no" in the general case.
As someone else (Guido?) pointed out, the literal case isn't all that interesting. I modified floatobject.c to track a few interesting floating point values:
[...code...]
So for a largely non-floating point "application", a fair number of floats are allocated, a bit over 25% of them are -1.0, 0.0 or +1.0, and nearly 50% of them are whole numbers between -10.0 and 10.0, inclusive.
Seems like it at least deserves a serious look. It would be nice to have the numeric crowd contribute to this subject as well.
As a representative of the numeric crowd, I'll say that I've never noticed this to be a problem. I suspect that it's a non issue since we generally store our numbers in arrays, not big piles of Python floats, so there's no opportunity for identical floats to pile up. -tim
Nick Craig-Wood wrote:
Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value?
1.0 might be another candidate for cacheing. Although the fact that nobody has complained about this before suggests that it might not be a frequent enough problem to be worth the effort. -- Greg
On 9/29/06, Greg Ewing
Nick Craig-Wood wrote:
Is there any reason why float() shouldn't cache the value of 0.0 since it is by far and away the most common value?
1.0 might be another candidate for cacheing.
Although the fact that nobody has complained about this before suggests that it might not be a frequent enough problem to be worth the effort.
My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance). That said, I've never noticed this particular issue.. but I deal with mostly strings. I have had issues with the allocator a few times that I had to work around, but not this sort of issue. -bob
Bob Ippolito schrieb:
My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance).
Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use. Regards, Martin
Martin v. Löwis wrote:
Bob Ippolito schrieb:
My guess is that people do have this problem, they just don't know where that memory has gone. I know I don't count objects unless I have a process that's leaking memory or it grows so big that I notice (by swapping or chance).
Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use.
I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such:
float('1') is float('1') False float('0') is float('0') False
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
"Nick Coghlan"
I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such:
For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case. tjr
On 9/30/06, Terry Reedy
"Nick Coghlan"
wrote in message news:451E31ED.7030905@gmail.com... I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such:
For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case.
But of course you have to know that common float values are never cached and that it may cause you problems. Some users may expect them to be because common strings and integers are cached. -bob
On Sat, Sep 30, 2006 at 03:21:50PM -0700, Bob Ippolito wrote:
On 9/30/06, Terry Reedy
wrote: "Nick Coghlan"
wrote in message news:451E31ED.7030905@gmail.com... I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such:
Over a TCP socket in ASCII format for my application
For such situations, one could create a translation dict for both common float values and for non-numeric missing value indicators. For instance, flotran = {'*': None, '1.0':1.0, '2.0':2.0, '4.0':4.0} The details, of course, depend on the specific case.
But of course you have to know that common float values are never cached and that it may cause you problems. Some users may expect them to be because common strings and integers are cached.
I have to say I was surprised to find out how many copies of 0.0 there
were in my code and I guess I was subconsciously expecting the
immutable 0.0s to be cached even though I know consciously I've never
seen anything but int and str mentioned in the docs.
--
Nick Craig-Wood
Nick Coghlan schrieb:
Right. Although I do wonder what kind of software people write to run into this problem. As Guido points out, the numbers must be the result from some computation, or created by an extension module by different means. If people have many *simultaneous* copies of 0.0, I would expect there is something else really wrong with the data structures or algorithms they use.
I suspect the problem would typically stem from floating point values that are read in from a human-readable file rather than being the result of a 'calculation' as such:
That's how you can end up with 100 different copies of 0.0. But apparently, people are creating millions of them, and keep them in memory simultaneously. Unless the text file *only* consists of floating point numbers, I would expect they have bigger problems than that. Regards, Martin
participants (11)
-
"Martin v. Löwis"
-
Bob Ippolito
-
Fredrik Lundh
-
Greg Ewing
-
Jason Orendorff
-
Nick Coghlan
-
Nick Craig-Wood
-
skip@pobox.com
-
Steve Holden
-
Terry Reedy
-
Tim Hochberg