Bug in floating point multiplication

Jason Swails jason.swails at gmail.com
Mon Jul 6 20:46:13 CEST 2015


On Mon, Jul 6, 2015 at 11:44 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com>
wrote:

> On Sat, 4 Jul 2015 at 02:12 Jason Swails <jason.swails at gmail.com> wrote:
>
>> On Fri, Jul 3, 2015 at 11:13 AM, Oscar Benjamin <
>> oscar.j.benjamin at gmail.com> wrote:
>>
>>> On 2 July 2015 at 18:29, Jason Swails <jason.swails at gmail.com> wrote:
>>>
>>> Where is the 32 bit one looks like:
>>>
>>> $ objdump -d a.out.32 | less
>>> ...
>>>
>>  804843e:  fildl  -0x14(%ebp)
>>>  8048441:  fmull  -0x10(%ebp)
>>>  8048444:  fnstcw -0x1a(%ebp)
>>>  8048447:  movzwl -0x1a(%ebp),%eax
>>>  804844b:  mov    $0xc,%ah
>>>  804844d:  mov    %ax,-0x1c(%ebp)
>>>  8048451:  fldcw  -0x1c(%ebp)
>>>  8048454:  fistpl -0x20(%ebp)
>>>  8048457:  fldcw  -0x1a(%ebp)
>>>  804845a:  mov    -0x20(%ebp),%eax
>>>  804845d:  cmp    -0x14(%ebp),%eax
>>>  8048460:  jne    8048477 <main+0x5c>
>>>  8048462:  sub    $0x8,%esp
>>>  8048465:  pushl  -0x14(%ebp)
>>>  8048468:  push   $0x8048520
>>>  804846d:  call   80482f0 <printf at plt>
>>>  8048472:  add    $0x10,%esp
>>>  8048475:  jmp    8048484 <main+0x69>
>>>  8048477:  addl   $0x1,-0x14(%ebp)
>>>  804847b:  cmpl   $0xf423f,-0x14(%ebp)
>>>  8048482:  jle    804843e <main+0x23>
>>> ...
>>>
>>> So the 64 bit one is using SSE instructions and the 32-bit one is
>>> using x87. That could explain the difference you see at the C level
>>> but I don't see it on this CPU (/proc/cpuinfo says Intel(R) Core(TM)
>>> i5-3427U CPU @ 1.80GHz).
>>>
>>
>> ​Hmm.  Well that could explain why you don't get the same results as me.
>> My CPU is a
>> AMD FX(tm)-6100 Six-Core Processor
>> ​ (from /proc/cpuinfo).  My objdump looks the same as yours for the
>> 64-bit version, but for 32-bit it looks like:
>>
>
> So if we have different generated machine instructions it suggests a
> difference in the way it was compiled rather than in the hardware itself.
> (Although it could be that the compilers were changed because the hardware
> was inconsistent in this particular usage).
>

​I had assumed that the different compilations resulted from different
underlying hardware (i.e., the instructions used on the Intel chip were
either unavailable, or somehow deemed heuristically inferior on the AMD
chip).​

>
>
>> However, I have no experience looking at raw assembler, so I can't
>> discern what it is I'm even looking at (nor do I know what explicit SSE
>> instructions look like in assembler).
>>
>
> The give away is that SSE instructions use the XMM registers so where you
> see %xmm0 etc data is being loaded ready for SSE instructions. I'll
> translate the important part of the 32 bit code below:
>

​Oh of course.  I *have* seen/worked with code that uses routines from
xmmintrin.h, so I should've been able to piece the xmm together with the
SSE instructions.
​

> [
> ​snip]​
> This means that the x87 register will be storing a higher precision result
> in its 80 bit format. This result will have to be rounded by the FSTPL
> instruction.
>
> If you look at the assembly output I showed you'll see the instructions
> FNSTCW/FLDCW (x87 store/load control word) which are used to manipulate the
> control word to tell the FPU how to perform this kind of rounding. The fact
> that we don't see it in your compiled output could indicate a C compiler
> bug which could in turn explain the different behaviour people see from
> Python.
>
> To understand exactly why 2049 is the number where it fails consider that
> it is the smallest integer that requires 12 bits of mantissa in floating
> point format. The number 1-.5**53 has a mantissa that is 53 ones:
>
> >>> x = 1-.5**53
> >>> x.hex()
> '0x1.fffffffffffffp-1'
>
> When extended to 80 bit real-extended format with a 64 bit mantissa it
> will have 11 trailing zeros. So I think multiplication of x with any
> integer less than 2049 can be performed exactly by the FMULL instruction. I
> haven't fully considered what impact that would have but it seems
> reasonable that this is why 2049 is the first number that fails.
>

​Wow.  Great discussion and description -- I'm convinced.​

Thanks a lot,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150706/da4f381a/attachment.html>


More information about the Python-list mailing list