Memory size overflows
Hello everybody, All around the C code there are potential problems with objects of very large sizes (http://www.python.org/sf/618623). The problem is that to allocate a variable-sized object of type 't' with 'n' elements we compute 'n*t->tp_itemsize', which can overflow even if 'n' is a perfectly legal value. If the truncated result is small, the subsequent malloc() suceeds, and we run into a segfault by trying to access more memory than reserved. The same problem exists at other places -- more or less everywhere we add or multiply something to a number that could be user-supplied. For example, Guido just fixed '%2147483647d'%-123. A rather artificial example, I agree, but a hole anyway. To fix this I suggest introducing a few new macros in pymem.h that compute things about sizes with overflow checking. I can see a couple of approaches based on special values that mean "overflow": 1) there is just one special "overflow" value, e.g. ((size_t)-1), that is returned and propagated by the macros when an overflow is detected. This might be error-prone because if we forget once to use the macros to add a few bytes to the size, this special value will wrap down to a small legal value -- and segfault. 2) same as above, but with a whole range of overflow values. For example, just assume (or decide) that no malloc of more than half the maximum number that fits into a size_t can succeed. We don't need any macro to add a (resonable) constant to a size. We need a macro for multiplication that -- upon overflow -- returns the first number of the "overflow" range. The Add macro is still needed to sum *two* potentially large numbers. 3) we compute all sizes with signed integers (int or long), as is currently (erroneously?) done at many places. Any negative integer is regarded as "overflow", but the multiplication macro returns the largest negative integer in case of overflow, so that as above no addition macro is needed for the simple cases. This will require a "multiplication hunt party" :-) Also, approaches 2 and 3 require fixes to ensure that 'malloc(any-overflow-size)' always fail, for any of the several implementations of malloc found in the code. Even with approach 1, I would not trust the platform malloc to correctly fail on malloc(-1) -- I guess it might "round up" the value to 0 before it proceed... Armin
[Armin Rigo]
All around the C code there are potential problems with objects of very large sizes (http://www.python.org/sf/618623). The problem is that to allocate a variable-sized object of type 't' with 'n' elements we compute 'n*t->tp_itemsize', which can overflow even if 'n' is a perfectly legal value.
I confess I always ignore these until one pops up in real life. Checking slows the code, and that causes me pain <0.5 wink>. Note that the multiplication isn't the only problem: we usually go on to add in the size of the object header, and that can overflow too. This leads to heartbreaking code such as string_repeat's: nbytes = size * sizeof(char); if (nbytes / sizeof(char) != (size_t)size || nbytes + sizeof(PyStringObject) <= nbytes) { PyErr_SetString(PyExc_OverflowError, "repeated string is too long"); return NULL; } Making it worse, while malloc takes an unsigned arg, Python usually uses a signed type to hold these things. When I fixed the code above (it did pop up in real life there), I made sure nbytes was declared size_t, so that the compiler could do the division-by-constant with a simple shift (well, OK, sizeof(char) is necessarily 1U, so the division is pointless there anyway; in other contexts it's not, and signed/power_of_2 is more expensive than unsigned/power_of_2 because most C compilers don't allow the former to compute the natural floor; note that anything/non_constant can be incredibly expensive!).
If the truncated result is small, the subsequent malloc() suceeds, and we run into a segfault by trying to access more memory than reserved. The same problem exists at other places -- more or less everywhere we add or multiply something to a number that could be user-supplied.
Yup -- and that, in a nutshell, is why I usually ignore these <wink>. Note that it's not enough even to ensure malloc gets a sensible argument: on at least two flavors of Unix, malloc() returning a non-NULL pointer doesn't guarantee you can access the memory without segfaulting. In the end, it's a game that can't be won. It may be nicer to do a little better, but there are real costs too.
... To fix this I suggest introducing a few new macros in pymem.h that compute things about sizes with overflow checking. I can see a couple of approaches based on special values that mean "overflow":
If we're going to this, I suggest an ubiquitous trick from Zope's C code: DO_SOMETHING_OR(RESULT_LVALUE, INPUT1, ..., ON_ERROR_BLOCK); That is, the result goes into RESULT_LVALUE, unless there's an overflow; in the latter case, ON_ERROR_BLOCK is executed. This is usually something like "return NULL;" or "goto Overflow;". This has the nice property of pushing the error-case code out of the normal flow. A goto is especially nice, because the label provides a handy place to set a debugger breakpoint (always a practical problem when code hides in macro expansions).
... 3) we compute all sizes with signed integers (int or long), as is currently (erroneously?) done at many places. Any negative integer is regarded as "overflow", but the multiplication macro returns the largest negative integer in case of overflow, so that as above no addition macro is needed for the simple cases.
There's really never been anything special about "the sign bit", and some 32-bit boxes can allocate 1UL<<31 bytes. This becomes more of a problem as Python gets applied to larger problems, and 2 gigabytes is no longer an insane amount of address space.
... Also, approaches 2 and 3 require fixes to ensure that 'malloc(any-overflow-size)' always fail, for any of the several implementations of malloc found in the code.
pymalloc is already suitably paranoid, even to the extent that the debug pymalloc makes sure adding 16 byes for its padding doesn't overflow.
Even with approach 1, I would not trust the platform malloc to correctly fail on malloc(-1) -- I guess it might "round up" the value to 0 before it proceed...
malloc() can't get an argument of -1: it takes a size_t argument, and size_t is guaranteed to be unsigned. What *does* happen is that some platform mallocs don't do what the debug pymalloc does, meaning that they don't check that adding in their own overhead doesn't overflow. So, e.g., if they need a 4-byte header for bookkeeping, they'll add 4 to the argument and not even notice if that "wraps around" to 0, 1, 2, or 3. Disaster ensues. But it's not really Python's job to fix broken platform libraries. lukewarmly y'rs - tim
Tim Peters wrote:
[Armin Rigo]
All around the C code there are potential problems with objects of very large sizes (http://www.python.org/sf/618623). The problem is that to allocate a variable-sized object of type 't' with 'n' elements we compute 'n*t->tp_itemsize', which can overflow even if 'n' is a perfectly legal value.
I confess I always ignore these until one pops up in real life. Checking slows the code, and that causes me pain <0.5 wink>.
Note that the multiplication isn't the only problem: we usually go on to add in the size of the object header, and that can overflow too. This leads to heartbreaking code such as string_repeat's:
(sigh) [snipped all the rest of good stuff] May I throw in a "weird" idea, again? I'm thinking since quite some time that Python should really abandon 32 bit arithmetic once and forwver and switch to use 64 bits, everywhere. It would not be the first scripting language which does so (wasn't it Ruby?), and it would benefit from that quite much, I think. I believe that 32 bit technology is outworn like 16 bits have before. When? Well, I'd say 16 years ago, exactly, if we assume an increase of memory size of 1 bit per year, which seems realistic. There are already a couple of 64 bit machines running Python. These would not see any slow-down by switching to 64 bits. New upcoming processors a likely to be 64 bitters pretty soon, and I hope to see such a nice machine like Knuth's MMIX in the next five years or earlier. Most probably, 64 bits will then be state of the art, and by switching to it right now, we are solving a lot of problems for the next 32 years ahead. This is beyond the careers of most of the "old" guys around, giving us a chance to solve some problem "once and forever". The current small speed loss will vanish from alone :) It-might-sound-funny-but-it-*is*-my-opinion - ly y'rs chris -- Christian Tismer :^) mailto:tismer@tismer.com Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
Hello Tim, Let me quote from stringobject.c the "heartbreaking code such as string_repeat's": nbytes = size * sizeof(char); if (nbytes / sizeof(char) != (size_t)size || nbytes + sizeof(PyStringObject) <= nbytes) <OverflowError> op = (PyStringObject *) PyObject_MALLOC(sizeof(PyStringObject) + nbytes); if (op == NULL) <MemoryError> With the proper macro, the whole code above can be written as: nbytes = Py_MUL_ADD(size, sizeof(char), sizeof(PyStringObject)); op = (PyStringObject *) PyObject_MALLOC(nbytes); if (op == NULL) <MemoryError> which seems much cleaner to me. For this reason I do not believe it is a good idea here to define a macro a la Zope C: DO_MULADD_OR(nbytes, size, sizeof(char), sizeof(PyStringObject), goto Overflow); for two reasons: there is no real need for a special "Overflow" case, as long as the returned 'nbytes' is guaranteed to be un-malloc-able; besides, no existing macro (e.g. _PyObject_VAR_SIZE) can support an extra "goto Overflow" case without breaking C code that depends on it. There is however a difference between the two above code fragments, in that the former will raise an OverflowError for "really too large values" and a MemoryError for "reasonable value but still too large for me to allocate right now". Whether this is a correct behavior is debatable. I would argue that raising MemoryError in both cases is fine: after all, "123"*9999999999999999 could be perfectly fine in another Python implementation with lazy strings. It is thus only a memory limit that makes this fail in the current implementation. Note that if we define MAX_SIZE to ((size_t)-1), the overflow check in the present stringobject.c could be more efficiently written down as: if (size > (MAX_SIZE - sizeof(PyStringObject)) / sizeof(char)) <OverflowError> where everything at the right of '>' reduces to a compile-time constant. The Py_MUL_ADD macro can be implemented with this idea in mind, which works best if the 2nd and 3rd arguments are constants.
note that anything/non_constant can be incredibly expensive!
Yes. Couldn't a better overflow-checking algorithm be designed for these cases? For example, to multiply the non-constants 'a' and 'b', we can be sure that there is no overflow if both 'a' and 'b' are small enough (which is the common case anyway). I believe I can even think about a single Py_MUL_ADD macro that is as efficient as above with constant arguments, and otherwise uses this trick. About using 64-bit computations everywhere (as suggested by Christian): if the length of objects (e.g. lists) is still constrained to 31 bits, then it's fine; otherwise, the problem is the same. Do all supported 64-bit platforms have a 32-bit 'int', and thus a 31-bit limit on object lengths? By the way, this makes me think of a good efficient overflow-detection method for platforms with a "long long" type longer than size_t: unsigned long long bytes = ((unsigned long long)a)*((unsigned long long)b)+c; if (bytes > ((size_t)-1)) <overflow>; This compiles to very good assembly code on x86. Mmmh, in summary I would recommend that we define a common "multiply-and-add" macro, returning a size_t or OVERFLOW_VALUE in case of overflow, which is implemented in several different ways on different platforms, as selected by #if's. The same macro is used for just "multiply" or just "add" by relying on the compiler's strengh reduction. Armin
I'd like to add that, until we decide on a Really Big Rewrite, the length of all Python sequences must fit in a signed int. So relying on malloc returning NULL isn't really a viable approach: even if you have, say, 2.5 GB of memory in your box, you still can't create a string of that size. I do agree that the distinction between OverflowError and MemoryError is not worth splitting hairs over. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Armin Rigo]
Let me quote from stringobject.c the "heartbreaking code such as string_repeat's":
nbytes = size * sizeof(char); if (nbytes / sizeof(char) != (size_t)size || nbytes + sizeof(PyStringObject) <= nbytes) <OverflowError> op = (PyStringObject *) PyObject_MALLOC(sizeof(PyStringObject) + nbytes); if (op == NULL) <MemoryError>
With the proper macro, the whole code above can be written as:
nbytes = Py_MUL_ADD(size, sizeof(char), sizeof(PyStringObject)); op = (PyStringObject *) PyObject_MALLOC(nbytes); if (op == NULL) <MemoryError>
which seems much cleaner to me.
It's not the appearance of the code that breaks my heart (although it doesn't cheer it either <wink>), it's that when we "fix" one of these things that pops up in real life, we add a pile of expensive tests and branches to code that's been known to fall over only once in 10 years. So Python gets bigger and slower for everyone, and half the time there's no real cause beyond that some wiener is playing "let's try to break it!".
For this reason I do not believe it is a good idea here to define a macro a la Zope C:
DO_MULADD_OR(nbytes, size, sizeof(char), sizeof(PyStringObject), goto Overflow);
for two reasons: there is no real need for a special "Overflow" case,
I don't know. It depends on context, such as how many times a potential overflow can occur within a single routine. Probably not often on average.
as long as the returned 'nbytes' is guaranteed to be un-malloc-able;
There is no nbytes that's guaranteed to bu un-malloc-able, unless you're going to impose artificial restrictions on malloc. It's not a happy thing to burden our malloc wrappers with another layer of test-and-branch either.
besides, no existing macro (e.g. _PyObject_VAR_SIZE) can support an extra "goto Overflow" case without breaking C code that depends on it.
Cool: yet another reason to leave things alone <wink>.
There is however a difference between the two above code fragments, in that the former will raise an OverflowError for "really too large values" and a MemoryError for "reasonable value but still too large for me to allocate right now". Whether this is a correct behavior is debatable.
Some things aren't worth debating, though -- it doesn't matter to me which exception we raise.
I would argue that raising MemoryError in both cases is fine:
Fine by me too.
after all, "123"*9999999999999999 could be perfectly fine in another Python implementation with lazy strings. It is thus only a memory limit that makes this fail in the current implementation.
Note that if we define MAX_SIZE to ((size_t)-1), the overflow check in the present stringobject.c could be more efficiently written down as:
(size_t)~0UL is safer, although there's really no wholly safe way to spell the intent in C89.
if (size > (MAX_SIZE - sizeof(PyStringObject)) / sizeof(char)) <OverflowError>
where everything at the right of '>' reduces to a compile-time constant. The Py_MUL_ADD macro can be implemented with this idea in mind, which works best if the 2nd and 3rd arguments are constants.
note that anything/non_constant can be incredibly expensive!
Yes. Couldn't a better overflow-checking algorithm be designed for these cases? For example, to multiply the non-constants 'a' and 'b', we can be sure that there is no overflow if both 'a' and 'b' are small enough (which is the common case anyway). I believe I can even think about a single Py_MUL_ADD macro that is as efficient as above with constant arguments, and otherwise uses this trick.
And more tests, and more branches. Life will be much easier if we can limit it to size_t thingies (which have the same size, and are unsigned0.
About using 64-bit computations everywhere (as suggested by Christian): if the length of objects (e.g. lists) is still constrained to 31 bits, then it's fine; otherwise, the problem is the same. Do all supported 64-bit platforms have a 32-bit 'int', and thus a 31-bit limit on object lengths?
No. For example, some now-rare models of Cray boxes have no 32-bit types. If 64-bit machines really are the wave of the future (I started using them in the late 1970's, and am still waiting for the revolution <0.9 wink>), it's no cure anyway, as Python in a truly 64-bit world will have to lose its 2GB-bounded of the world (as it's already painfully doing for files, and maybe for mmaps). Then the "detect overflow" problem returns with a vengeance.
By the way, this makes me think of a good efficient overflow-detection method for platforms with a "long long" type longer than size_t:
unsigned long long bytes = ((unsigned long long)a)*((unsigned long long)b)+c; if (bytes > ((size_t)-1)) <overflow>;
This also implicitly assumes that: 1. size_t is no wider than an int; that's the kind of thing the "(size_t)~0UL would be safer" comment above was getting at. 2. long long is at least twice as wide as size_t.
This compiles to very good assembly code on x86.
Mmmh, in summary I would recommend that we define a common "multiply-and-add" macro, returning a size_t or OVERFLOW_VALUE in case of overflow, which is implemented in several different ways on different platforms, as selected by #if's. The same macro is used for just "multiply" or just "add" by relying on the compiler's strengh reduction.
What a PITA. Is it worth it? If I don't have to do it, maybe <wink>. BTW, when we allocate an object that participates in cyclic GC, there's Yet Another little integer that gets added to the total size (to reflect the "extra header" bytes unique to GC'ed objects), and so there's Yet Another place potential overflow may go uncaught now.
I think this discussion is suffering from the desire to over-generalize. Armin, can you show us how to multiply two nonnegative ints together with an overflow "block" as Tim suggested earlier? I.e. a macro callable as SAFE_MULTIPLY(destination, src1, src2, on_overflow); meaning roughly destination = src1 * src2; if (<overflow detected>) on_overflow; I currently use something like this as the overflow test in a few places: if (src1 && destination / src1 != src2) but as Tim points out the division can kill us unless src1 is a constant... How would you spell this without doing a division? --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... I currently use something like this as the overflow test in a few places:
if (src1 && destination / src1 != src2)
but as Tim points out the division can kill us unless src1 is a constant... How would you spell this without doing a division?
Armin answered that in the case there's an efficient "unsigned long long" type at least twice the width of the inputs. Then you just do the product in the fat precision and see whether any of the upper-half bits are set. As a pragmatic matter, 2/5ths of that trick can work "most of the time": #define YOWZA (sizeof(size_t) * 4U) if ((src1 | src2} >> YOWZA) hard to tell else overflow is impossible would get out quick in *most* non-overflow cases. The fast path there succeeds if and only if both inputs are at most "half full". For example, in string * int it's rare that len(string) >= 64K or int >= 64K. For what to do in the other half, see the history of subtly buggy attempts to detect overflow quickly in Python's int * int code. Without a double-width type to fall back on, I think it's impossible to do this both correctly and quickly. BTW, if we know we're not interested in anything other than 32-bit products, double _prod = (double)src1 * (double)src2; if (_prod < 4294967296.0) /* 2**32 */ result = (size_t)_prod; else overflow; does the trick reliably on all platforms I know of -- although not all Python platforms support floating-point efficiently! It's a good puzzle <wink>.
The following is really ugly (until you put it in a macro), but would work:
#define HALF_BITS (sizeof(size_t) * 4U)
#define LO_MASK ((((size_t)1) << HALF_BITS)-1)
#define HI_MASK (LO_MASK<
[Scott Gilbert]
The following is really ugly (until you put it in a macro), but would work:
#define HALF_BITS (sizeof(size_t) * 4U) #define LO_MASK ((((size_t)1) << HALF_BITS)-1) #define HI_MASK (LO_MASK<
if ( /* Tim's fast path */ ((src1 | src2) & HI_MASK) && ( /* Ugly slower path */ (((src1&LO_MASK) * (src2>>HALF_BITS)) & HI_MASK) | (((src1>>HALF_BITS) * (src2&LO_MASK)) & HI_MASK) | (src1>>HALF_BITS) * (src2>>HALF_BITS)) ) { goto overflow; }
I'll repeat that it's hard to do this correctly and quickly (c.f. earlier comments about the history of Python's subtly buggy attempts). This kind of approach is viewing src1 and src2 as 2-digit numbers in base X = 2**HALF_BITS: src1 = A*X + B src2 = C*X + D then viewing their product as highpart: A*C*X**2 + xxxxxxxx middlepart1: A*D*X + xxxxxxxx middlepart2: B*C*X + xxxxxxxx lowpart: B*D xxxxxxxx ---------------- xxxxxxxxxxxxxxxx <danger>< safe > Your second line checks whether the high half of middlepart2 is 0; your third line whether the high half of middlepart1 is 0; your last line whether highpart is 0. But they can all be 0 and *still* have overflow: the sum low half of middlepart1 + low half of middlepart2 + high half of lowpart can carry over into the <danger> zone. Supposing we were looking at 16-bit products, here's an example: 0x01ff * 0x00ff ------ 0000 00ff 0000 fe01 -------- 0001fd01 ^ oops Hint: before posting something like this, prototype it in Python, and do an exhaustive check of all possible pairs of multiplicands in an artificially small base. Very few buggy algorithms survive that, even for very small artificial bases.
[Tim Peters]
I'll repeat that it's hard to do this correctly and quickly (c.f. earlier comments about the history of Python's subtly buggy attempts).
[...]
0x01ff * 0x00ff ------ 0000 00ff 0000 fe01 -------- 0001fd01 ^ oops
That was sloppy on my part. Thanks for showing me my error. (That's as gracious as I can be while saying "D'Oh!" on the inside. :-) Clearly you know the long answer to this problem. It could involve taking the ugly expression I had and adding 3 more terms to see if it adds up to a problem. Or if all those integer multiplies make it less attractive than just going to double, then taking the log2() of each operand and counting the bits to see if there is a problem. Or using the PyLong_* machinery. Or something similar. And you've got your fast path past the test for the common case, so it looks like you're just hesitant to make the big ugly macro... __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com
Tim Peters
I'll repeat that it's hard to do this correctly and quickly (c.f. earlier comments about the history of Python's subtly buggy attempts).
The really tragic thing about all this is that most machines have perfectly good facilities for detecting overflow at the machine level, but no commonly used HLL that I know of provides access to them, for some unfathomable reason. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
[Greg Ewing]
The really tragic thing about all this is that most machines have perfectly good facilities for detecting overflow at the machine level, but no commonly used HLL that I know of provides access to them, for some unfathomable reason.
Well, that's where it bifurcates Yet Again -- Pentiums have a perfectly usable "give me the high bits of a 32x32->64 bit product" instruction, and it's just a couple lines of inline assembler under MSVC or gcc (both make such stuff easy to get at, embedded in C code). But as I said at the start, I ignore these things until they pop up in real life. So few *have* popped in real life that there hasn't been any point to creating mounds of platform-dependent #ifdef'ed infrastructure.
if we know we're not interested in anything other than 32-bit products,
double _prod = (double)src1 * (double)src2; if (_prod < 4294967296.0) /* 2**32 */ result = (size_t)_prod; else overflow;
does the trick reliably on all platforms I know of -- although not all Python platforms support floating-point efficiently!
I like this one, as well as the one using 64-bit precision. I wonder if we could collect a bunch of different macros and make them tunable for different platforms? (Including a "don't check for overflow" version for people who wanna play fast and loose. :-) The trick will to ensure that the macro arguments are used exactly once in all versions (otherwise switching macros might unveal a new class of bugs). Apart from *how* to check, there's a variety of cases depending on the types of the input and output variables. There's the ob_size calculation for sequence repeat, which takes a C int and a C long and should return something that fits in a C int; and there's the malloc size calculation for various ones, which should take an int and a (small constant) size_t and return a size_t. So maybe we need two macros: one for int*long->int, and one for int*size_t->size_t? With several different implementations, chosen by configure. But first we'd have to see of there are enough places to benefit from such elaborate machinery. There are also additions, usually of small constants, which generally operate on the size_t. These are easier to test: if you add a small positive constant to a size_t, and the result is smaller than the original size_t, an overflow occurred. Similar for plain ints. So I'm not sure we need much help with these. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... There are also additions, usually of small constants, which generally operate on the size_t. These are easier to test: if you add a small positive constant to a size_t, and the result is smaller than the original size_t, an overflow occurred.
More generally, if x and y are both size_t, and sizeof(size_t) >= sizeof(unsigned int), x+y overflowed if and only if x+y < x Checking x+y < y is equivalent -- you can compare the sum to either input; doesn't matter.
Similar for plain ints.
Signed ints are harder. If the signs differ, overflow is impossible. If the signs are the same, overflow occurred if and only if the sign of the result differs from the common input sign. sum = x+y overflow iff ((sum ^ x) & (sum ^ y)) < 0
So I'm not sure we need much help with these.
Well, they're all obscure as hell.
Hello Guido, On Tue, 15 Oct 2002, Guido van Rossum wrote:
I like this one, as well as the one using 64-bit precision. I wonder if we could collect a bunch of different macros and make them tunable for different platforms?
Doable. I experimented with some macros that do (unsigned int)*(unsigned int)->size_t, where the macro to use is selected according to #if's based on the relative size of int, long and long long.
(...) The trick will to ensure that the macro arguments are used exactly once in all versions (otherwise switching macros might unveal a new class of bugs).
Harder...
Apart from *how* to check, there's a variety of cases depending on the types of the input and output variables.
True as well... Why are C macros so limited?...
There are also additions, usually of small constants, which generally operate on the size_t.
If we are sure these are small, we can just add without any check if we reserve a whole range of values for "overflow", e.g. ((size_t)~0x7fff) to ((size_t)~0). Not too clean however. Armin
[Guido]
The trick will to ensure that the macro arguments are used exactly once in all versions (otherwise switching macros might unveal a new class of> > bugs).
[Armin Rigo]
Harder...
It's trivial if you define the macro to take an lvalue result argument. Then the macro can expand to a block, and so declare block-local temp variables, including, if desired, temps that are merely initialized to other macro arguments. Such temps can be reused as often as you like without semantic consquence. Contorting this into expression rvalue form is a challenge, but a worthier challenge is yielding to common sense here <wink>.
On Tue, Oct 15, 2002 at 01:29:36AM -0400, Tim Peters wrote:
BTW, if we know we're not interested in anything other than 32-bit products,
double _prod = (double)src1 * (double)src2; if (_prod < 4294967296.0) /* 2**32 */ result = (size_t)_prod; else overflow;
s/_prod;/src1 * src2;/ Re-calculating the product with integer multiplication is probably faster than converting a double to integer on any sane CPU. I wonder if the old SPARCs that don't have an integer multiply instruction actually have a floating point to integer conversion that is faster than integer mul. Wasting "expensive" multiplications this way still feels blasphemous :) Oren
[Tim]
BTW, if we know we're not interested in anything other than 32-bit products,
double _prod = (double)src1 * (double)src2; if (_prod < 4294967296.0) /* 2**32 */ result = (size_t)_prod; else overflow;
[Oren Tirosh]
s/_prod;/src1 * src2;/
Re-calculating the product with integer multiplication is probably faster than converting a double to integer on any sane CPU.
This is so. Better, because it allows more instruction-level parallelism: double dprod = (double)src1 * (double)src2; size_t iprod = src1 * src2; if (dprod < 4294967296.0) /* 2**32 */ result = iprod; else /* overflow */
I wonder if the old SPARCs that don't have an integer multiply instruction actually have a floating point to integer conversion that is faster than integer mul.
I'll let someone with an old SPARC worry about that.
Wasting "expensive" multiplications this way still feels blasphemous :)
Well, we could simulate multiplication by hand, with a bit-at-a-time
shift-and-add loop. This reduces the detect-mul-overflow case to the
detect-add-overflow case. Bonus: on some odd boxes, it would be faster
Re-calculating the product with integer multiplication is probably faster than converting a double to integer on any sane CPU.
Um... what would be "sane" about that? I don't know much about the Pentium. I do know that the PowerPC doesn't have any way of moving a value from a float to an integer register without going through memory... but I regard that as a decidedly INsane feature! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (8)
-
Armin Rigo
-
Armin Rigo
-
Christian Tismer
-
Greg Ewing
-
Guido van Rossum
-
Oren Tirosh
-
Scott Gilbert
-
Tim Peters