Drop support for ones' complement machines?
Question: should the CPython source compile cleanly and work correctly on (mostly ancient or hypothetical) machines that use ones' complement or sign-and-magnitude to represent signed integers? I'd like to explicitly document and make use of the following assumptions about the underlying C implementation for CPython. These assumptions go beyond what's guaranteed by the various C and C++ standards, but seem to be almost universally satisfied on modern machines: - signed integers are represented using two's complement - for signed integers the bit pattern 100....000 is not a trap representation; hence INT_MIN = -INT_MAX-1, LONG_MIN = -LONG_MAX-1, etc. - conversion from an unsigned type to a signed type wraps modulo 2**(width of unsigned type). Any thoughts, comments or objections? This may seem academic (and perhaps it is), but it affects the possible solutions to e.g., http://bugs.python.org/issue7406 The assumptions listed above are already tacitly used in various bits of the CPython source (Objects/intobject.c notably makes use of the first two assumptions in the implementations of int_and, int_or and int_xor), while other bits of code make a special effort to *not* assume more than the C standards allow. Whatever the answer to the initial question is, it seems worth having an explicitly documented policy. If we want Python's core source to remain 100% standards-compliant, then int_and, int_or, etc. should be rewritten with ones' complement and sign-and-magnitude machines in mind. That's certainly feasible, but it seems silly to do such a rewrite without a good reason. Python-dev agreement that ones' complement machines should be supported would, of course, be a good reason. Mark
Mark Dickinson wrote:
Question: should the CPython source compile cleanly and work correctly on (mostly ancient or hypothetical) machines that use ones' complement or sign-and-magnitude to represent signed integers?
I think that's the wrong question to ask. What you really meant to ask (IIUC) is this: Should CPython be allowed to invoke behavior that is declared undefined by the C standard, but has a clear meaning when assuming two's complement? This is different from your question, by taking into account that compilers may perform optimization based on the promises of the C standard. For example, compiling int f(int a) { if (a+1>a) return 6; else return 7; } with gcc 4.3.4, with -O2 -fomit-frame-pointer, compiles this to f: movl $6, %eax ret IOW, the compiler determines that the function will always return 6 (*). If you assume that the int type is guaranteed in two's complement, then the generated code would be wrong (and Python would not work correctly). So this isn't about unrealistic and outdated hardware, but about current and real problems. Therefore, I don't think we should make assumptions beyond what standard C guarantees. Regards, Martin (*) If you wonder why the gcc behavior is conforming to C: if a+1 does not overflow, it will be indeed greater than a, and the result is 6. If a+1 does overflow, undefined behavior occurs, and the program may do whatever it desires, including returning 6.
On Tue, Dec 1, 2009 at 1:46 PM, "Martin v. Löwis"
Mark Dickinson wrote:
Question: should the CPython source compile cleanly and work correctly on (mostly ancient or hypothetical) machines that use ones' complement or sign-and-magnitude to represent signed integers?
I think that's the wrong question to ask. What you really meant to ask (IIUC) is this: Should CPython be allowed to invoke behavior that is declared undefined by the C standard, but has a clear meaning when assuming two's complement?
No, the original question really was the question that I meant to ask. :) I absolutely agree that CPython shouldn't invoke undefined behaviour, precisely because of the risk of gcc (or some other compiler) optimizing based on the assumption that undefined behaviour never happens. This is why I opened issue 7406. So my question, and the listed assumptions, are about implementation- defined behaviour, not undefined behaviour. For the 3 assumptions listed, gcc obeys all those assumptions (see section 4.5 of the GCC manual). http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Integers-implementation.html Personally I can't think of any good reason not to make these assumptions. Mark
No, the original question really was the question that I meant to ask. :)
Ok. Then the reference to issue 7406 is really confusing, as this is about undefined behavior - why does the answer to your question affect the resolution of this issue?
I absolutely agree that CPython shouldn't invoke undefined behaviour, precisely because of the risk of gcc (or some other compiler) optimizing based on the assumption that undefined behaviour never happens. This is why I opened issue 7406.
So my question, and the listed assumptions, are about implementation- defined behaviour, not undefined behaviour. For the 3 assumptions listed, gcc obeys all those assumptions (see section 4.5 of the GCC manual).
http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Integers-implementation.html
I think gcc makes promises here beyond resolving implementation-defined behavior. For bitshift operators, C99 says (6.5.7) [#4] The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [#5] The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity, 2 raised to the power E2. If E1 has a signed type and a negative value, the resulting value is implementation- defined. Notice that only right-shift is implementation-defined. The left-shift of a negative value invokes undefined behavior, even though gcc guarantees that the sign bit will shift out the way it would under two's complement. So I'm still opposed to codifying your assumptions if that would mean that CPython could now start relying on left-shift to behave in a certain way. For right-shift, your assumptions won't help for speculation about the result: I think it's realistic that some implementations sign-extend, yet others perform the shift unsigned (i.e. zero-extend). I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me. Regards, Martin
On Tue, Dec 1, 2009 at 3:32 PM, "Martin v. Löwis"
No, the original question really was the question that I meant to ask. :)
Ok. Then the reference to issue 7406 is really confusing, as this is about undefined behavior - why does the answer to your question affect the resolution of this issue?
Apologies for the lack of clarity. So in issue 7406 I'm complaining (amongst other things) that int_add uses the expression 'x+y', where x and y are longs, and expects this expression to wrap modulo 2**n on overflow. As you say, this is undefined behaviour. One obvious way to fix it is to write (long)((unsigned long)x + (unsigned long)y) instead. But *here's* the problem: this still isn't a portable solution! It no longer depends on undefined behaviour, but it *does* depend on implementation-defined behaviour: namely, what happens when an unsigned long that's greater than LONG_MAX is converted to long. (See C99 6.3.1.3., paragraph 3: "Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.") It's this implementation-defined behaviour that I'd like to assume.
I think gcc makes promises here beyond resolving implementation-defined behavior. For bitshift operators, C99 says (6.5.7) [...]
Yes, I'm very well aware of the issues with shifting signed integers; I'm not proposing making any assumptions here.
So I'm still opposed to codifying your assumptions if that would mean that CPython could now start relying on left-shift to behave in a certain way. For right-shift, your assumptions won't help for speculation about the result: I think it's realistic that some implementations sign-extend, yet others perform the shift unsigned (i.e. zero-extend).
I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me.
I'm not even interested in going this far: I only want to make explicit the three assumptions I specified in my original post: - signed integers are represented using two's complement - for signed integers the bit pattern 100....000 is not a trap representation - conversion from an unsigned type to a signed type wraps modulo 2**(width of unsigned type). (Though I think these assumptions do in fact completely determine the behaviour of &, |, ^, ~.) As far as I know these are almost universally satisfied for current C implementations, and there's little reason not to assume them, but I didn't want to document and use these assumptions without consulting python-dev first. Mark
I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me.
I'm not even interested in going this far:
I still am: with your list of assumptions, it is unclear (to me, at least) what the consequences are. So I'd rather see an explicit list of consequences, instead of buying a pig in a poke. Regards, Martin
On Dec 1, 2009, at 11:08 AM, Martin v. Löwis wrote:
I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me.
I'm not even interested in going this far:
I still am: with your list of assumptions, it is unclear (to me, at least) what the consequences are. So I'd rather see an explicit list of consequences, instead of buying a pig in a poke.
I think all that needs to be defined is that conversion from unsigned to signed, and (negative) signed to unsigned integers have 2's complement wrapping semantics, and does not affect the bit pattern in memory. Stating it that way makes it clearer that all you're assuming is the operation of the cast operators, and it seems to me that it implies the other requirements. James
On Tue, Dec 1, 2009 at 4:17 PM, James Y Knight
I think all that needs to be defined is that conversion from unsigned to signed, and (negative) signed to unsigned integers have 2's complement wrapping semantics, and does not affect the bit pattern in memory.
Yes, I think everything does pretty much follow from this, since for ones' complement or sign-and-magnitude these wrapping semantics are impossible, because the signed type doesn't have enough distinct possible values.
Stating it that way makes it clearer that all you're assuming is the operation of the cast operators, and it seems to me that it implies the other requirements.
Agreed. Thanks, Mark
On Tue, Dec 1, 2009 at 11:24 AM, Mark Dickinson
On Tue, Dec 1, 2009 at 4:17 PM, James Y Knight
wrote: I think all that needs to be defined is that conversion from unsigned to signed, and (negative) signed to unsigned integers have 2's complement wrapping semantics, and does not affect the bit pattern in memory.
I don't know if this particular implementation defined behavior is safe to be relied upon. I just want to suggest that if any such assumption is made in the code, a test should be added to configure to complain loudly if a platform violating the assumption is found in the future.
On Tue, Dec 1, 2009 at 4:47 PM, Alexander Belopolsky
On Tue, Dec 1, 2009 at 4:17 PM, James Y Knight
wrote: I think all that needs to be defined is that conversion from unsigned to signed, and (negative) signed to unsigned integers have 2's complement wrapping semantics, and does not affect the bit pattern in memory.
I don't know if this particular implementation defined behavior is safe to be relied upon. I just want to suggest that if any such assumption is made in the code, a test should be added to configure to complain loudly if a platform violating the assumption is found in the future.
That sounds like a good idea. An extension of that would be to define an UNSIGNED_TO_SIGNED macro (insert better name here) which, depending on the result of the configure test, either used a direct cast or a workaround. E.g., for an unsigned long x, ((x) >= 0 ? (long)(x) : ~(long)~(x)) always gives the appropriate wraparound semantics (I think), assuming two's complement with no trap representation. Mark
On Tue, Dec 1, 2009 at 5:22 PM, Mark Dickinson
or a workaround. E.g., for an unsigned long x,
((x) >= 0 ? (long)(x) : ~(long)~(x))
always gives the appropriate wraparound semantics (I think), assuming
Sorry; should have tested. Try: ((x) <= LONG_MAX ? (long)(x) : ~(long)~(x)) Mark
On Tue, Dec 1, 2009 at 4:08 PM, "Martin v. Löwis"
I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me.
I'm not even interested in going this far:
I still am: with your list of assumptions, it is unclear (to me, at least) what the consequences are. So I'd rather see an explicit list of consequences, instead of buying a pig in a poke.
Okay; though I think that my list of assumptions is easier to check directly for any given implementation: it corresponds exactly to items 2 and 4 in C99 J.3.5, and any conforming C implementation is required to explicitly document how it behaves with regard to these items. I'm not sure how to decide which particular consequences should be listed, but those for &, |, ^ and ~ could certainly be included. Mark
I'd rather prefer to explicitly list what CPython assumes about the outcome of specific operations. If this is just about &, |, ^, and ~, then its fine with me. I'm not even interested in going this far: I still am: with your list of assumptions, it is unclear (to me, at least) what the consequences are. So I'd rather see an explicit list of consequences, instead of buying a pig in a poke.
Okay; though I think that my list of assumptions is easier to check directly for any given implementation: it corresponds exactly to items 2 and 4 in C99 J.3.5, and any conforming C implementation is required to explicitly document how it behaves with regard to these items.
I'm in favor stating the assumptions the way you do (*), I just want to have an additional explicit statement what consequences you assume out of these assumptions.
I'm not sure how to decide which particular consequences should be listed, but those for &, |, ^ and ~ could certainly be included.
It should give the CPython contributors an indication what kind of code would be ok, and which would not. Perhaps it should include both a black list and a white list: some may assume that two's complement already provides guarantees on left-shift, when it actually does not (**). Regards, Martin (*) I wonder why you are not talking about padding bits (6.2.6.2p1) (**) I also wonder why C fails to make left-shift implementation-defined, perhaps with an even stronger binding to the options for the integer representation.
[Mark]
I'm not sure how to decide which particular consequences should be listed, but those for &, |, ^ and ~ could certainly be included.
[Martin]
It should give the CPython contributors an indication what kind of code would be ok, and which would not. Perhaps it should include both a black list and a white list: some may assume that two's complement already provides guarantees on left-shift, when it actually does not (**).
Okay. I'll have to think about this a bit; I'll try to come up with some suitable wording.
(*) I wonder why you are not talking about padding bits (6.2.6.2p1)
Good point. Mostly because I haven't recently encountered any code where it matters, I suppose. But there's certainly CPython source that assumes no padding bits: long_hash in longobject.c is one example that comes to mind: it assumes that the number of value bits in an unsigned long is 8*SIZEOF_LONG.
(**) I also wonder why C fails to make left-shift implementation-defined, perhaps with an even stronger binding to the options for the integer representation.
I wonder too. The C rationale document doesn't have anything to say on this subject. Mark
participants (4)
-
"Martin v. Löwis"
-
Alexander Belopolsky
-
James Y Knight
-
Mark Dickinson