[ python-Feature Requests-1087418 ] long int bitwise ops speedup (patch included)

Fri Jan 7 07:54:49 CET 2005

Feature Requests item #1087418, was opened at 2004-12-18 00:22
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1087418&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
>Priority: 4
Submitted By: Gregory Smith (gregsmith)
>Assigned to: Nobody/Anonymous (nobody)
Summary: long int bitwise ops speedup (patch included)

Initial Comment:

The 'inner loop' for applying bitwise ops to longs is quite
inefficient.

The improvement in the attached diff is
 - 'a' is never shorter than 'b' (result: only test 1
   loop index condition instead of 3)
 - each operation ( & | ^ ) has its own loop, instead
   of switch inside loop
- I found that, when this is done, a lot
of things can be simplified, resulting in further speedup,
and the resulting code is not very much longer than
before (my libpython2.4.dll  .text got 140 bytes longer).

Operations on longs of a few thousand bits appear
to be 2 ... 2.5 times faster with this patch.
I'm not 100% sure the code is right, but it passes
test_long.py, anyway.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2005-01-07 01:54

Message:
Logged In: YES 
user_id=80475

Patch Review
------------

On Windows using MSC 6.0, I could only reproduce about a
small speedup at around 300 bits. 

While the patch is short, it adds quite a bit of complexity
to the routine.  Its correctness is not self-evident or
certain.  Even if correct, it is likely to encumber future
maintenance.

Unless you have important use cases and feel strongly about
it, I think this one should probably not go in.

An alternative to submit a patch that limits its scope to
factoring  out the innermost switch/case.  I tried that and
found that the speedup is microscopic.  I suspect that that
one unpredictable branch is not much of a bottleneck.  More
time is likely spent on creating z.

----------------------------------------------------------------------

Comment By: Gregory Smith (gregsmith)
Date: 2005-01-03 14:54

Message:
Logged In: YES 
user_id=292741

I originally timed this on a cygwin system, I've since found
that cygwin timings tend to be strange and possibly
misleading. On a RH8 system, I'm seeing speedup of x3.5 with
longs of ~1500 bits and larger, and x1.5 speedup with only
about 300 bits. Times were measured with timeit.Timer(
'a|b', 'a=...; b=...')
Increase in .text size is likewise about 120 bytes.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1087418&group_id=5470