Does Python allow access to some of the implementation details?

Claudio Grondi claudio.grondi at freenet.de
Sat Jan 7 19:53:24 EST 2006


Bengt Richter wrote:
> On Sat, 07 Jan 2006 14:05:18 +0100, Claudio Grondi <claudio.grondi at freenet.de> wrote:
> [...]
> 
>>What I am also looking for is a conversion to base 256 (i.e where the 
>>full byte is used and the string and the integer have the same actual 
>>content if on appropriate endian machine), which would make the bit 
>>extraction comparable easy and effective as the i.__hex__() based method.
>>Using digits() instead of the code you have provided speeds the whole 
>>thing up two times (see attached code for details), but still is 
>>i.__hex__() the best way to go and could be improved probably only by 
>>direct conversion to base 256 or even higher base as e.g. 2**16 or 2**32.
>>
> 
> (see Paul Rubin's post first).
> BTW, I'd bet that '%x'.__mod__(i) will save you time over i.__hex__() by the time
> you finish fiddling with 0x and L, even if you don't use the rest.
Yes, it seem to do, but not very significant, at least in case of the 
code provided (see attachment). Here the output of run of provided code 
on my Pentium 4, 3.0 GHz :
   Seconds for bit extraction from integer with 280 hex-digits when:
       looping over i>>1;i&0x01     = 0.001207
       ^-- but in 30 bit chunks     = 0.000904
       looping over i>>1;i%2        = 0.002409
       using i.__hex__() repr.      = 0.000186
       using '%02X'.__mod__(i)      = 0.000145
       using __hex__,array,binascii = 0.000117
       using __mod__,array,binascii = 0.000114
       using gmpy.digits(i,2)       = 0.000615
> 
> The question looming is, "What are you intending to do with your new representations
> of twos complement integers?"
As you see in the 'Subject' I am currently not primary after conversion 
of integers, but after understanding of Python implementation. The code 
posted is a kind of side effect required in order to be able to 
demonstrate progress if any.
What I have lately learned from this and some other threads in 
(de.)comp.lang.python is, that to be able to get the best out of Python 
in terms of speed and understanding what is or is not possible, it is 
vital and necessary to understand the implementation.
Another reason behind this all is, that I was so impressed as it had 
turned out, that it was possible to get the CPU time required for 
conversion of the largest known prime to decimal form down from six 
hours to six seconds (using BigDec), that I am now step by step checking 
if similar effects can't be achieved also elsewhere.

> 
> BTW2, an easy way to play with integers chopped into little-endian bit field sequences is
> a generator, e.g., (note that last field is sign bits, either all zeroes or all ones, so
> you can unambiguously reconstruct a signed integer from this variable width representation).
> (not very tested, and BTW as you see I hacked in using int when possible for return values)
> 
>  >>> def bitsof(n, width=8):
>  ...     m = (2**(width-1)-1)*2+1 # allow 2**width-1 == sys.maxint as int
>  ...     if type(m) is long:
>  ...         yield n&m
>  ...         while n>0 or n<-1:
>  ...             n>>=width
>  ...             yield n&m
>  ...     else:
>  ...         yield int(n&m)
>  ...         while n>0 or n<-1:
>  ...             n>>=width
>  ...             yield int(n&m)
>  ...
>  >>> list(bitsof(11,1))
>  [1, 1, 0, 1, 0]
>  >>> list(bitsof(5,1))
>  [1, 0, 1, 0]
>  >>> list(bitsof(-5,1))
>  [1, 1, 0, 1]
>  >>> hex(3**100)
>  '0x5A4653CA673768565B41F775D6947D55CF3813D1L'
>  >>> ''.join(map('%02X'.__mod__, bitsof(3**100, 8))[::-1])
>  '005A4653CA673768565B41F775D6947D55CF3813D1'
>  >>> ''.join(map('%02X'.__mod__, bitsof(-3**100, 8))[::-1])
>  'FFA5B9AC3598C897A9A4BE088A296B82AA30C7EC2F'
>  >>> hex(-3**100)
>  '-0x5A4653CA673768565B41F775D6947D55CF3813D1L'
> 
> <gentle rant>
> (I leave it to your judgement as to how useful our current hex() representation
> of negative numebers is ;-)
> </gentle rant>
> 
> Regards,
> Bengt Richter
I haven't yet run into negative integers, so thank's for making me aware 
of the potential problems with it. Yes, I would not expect the hex() to 
give me the minus sign - I see I should be very careful here (this is 
also the reason why I generally try to avoid usage of negative integers 
as good as I can).

Claudio

P.S. Attachment:

import time

# dctLstOfBitsVsCharOfNibble is a dictionary with a key beeing one 
character
# string representing hexadecimal value of a nibble and a value beeing a 
list
# of bits of the nibble where the lowest bit is stored at index 0 of the 
list.
# i.e.
dctLstOfBitsVsCharOfNibble = {
   '0':[0, 0, 0, 0],
   '1':[0, 0, 0, 1],
   '2':[0, 0, 1, 0],
   '3':[0, 0, 1, 1],
   '4':[0, 1, 0, 0],
   '5':[0, 1, 0, 1],
   '6':[0, 1, 1, 0],
   '7':[0, 1, 1, 1],
   '8':[1, 0, 0, 0],
   '9':[1, 0, 0, 1],
   'A':[1, 0, 1, 0],
   'B':[1, 0, 1, 1],
   'C':[1, 1, 0, 0],
   'D':[1, 1, 0, 1],
   'E':[1, 1, 1, 0],
   'F':[1, 1, 1, 1]
}
# The dctLstOfBitsVsCharOfNibble dictionary can be also generated by 
following code:
# dctLstOfBitsVsCharOfNibble = {}
# for intNibbleValue in range(0, 16):
#   lstBitReprOfCurrNibble=[]
#   for indxOfBit in range(0,4):
#     lstBitReprOfCurrNibble.append(intNibbleValue>>indxOfBit&0x01)
#   #:for
#   lstBitReprOfCurrNibble.reverse()
# 
dctLstOfBitsVsCharOfNibble['%X'%(intNibbleValue,)]=lstBitReprOfCurrNibble
# #:for

dctLstOfBitsVsOrdOfChar = {}
for ordOfChar in range(256):
   dctLstOfBitsVsOrdOfChar[ordOfChar] = 
dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar>>4,)] + 
dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar&0x0F,)]
#:for
# what gives: {
# 0x00:[0, 0, 0, 0, 0, 0, 0, 0],
# 0x01:[0, 0, 0, 0, 0, 0, 0, 1],
# 0x02:[0, 0, 0, 0, 0, 0, 1, 0],
#  ...
# 0xFD:[1, 1, 1, 1, 1, 1, 0, 1],
# 0xFE:[1, 1, 1, 1, 1, 1, 1, 0],
# 0xFF:[1, 1, 1, 1, 1, 1, 1, 1]
#              }
dctLstOfBitsVsHexOfChar = {}
for ordOfChar in range(256):
   dctLstOfBitsVsHexOfChar['%02X'%ordOfChar] = 
dctLstOfBitsVsOrdOfChar[ordOfChar]
#:for
# what gives: {
# '00':[0, 0, 0, 0, 0, 0, 0, 0],
# '01':[0, 0, 0, 0, 0, 0, 0, 1],
# '02':[0, 0, 0, 0, 0, 0, 1, 0],
#  ...
# 'FD':[1, 1, 1, 1, 1, 1, 0, 1],
# 'FE':[1, 1, 1, 1, 1, 1, 1, 0],
# 'FF':[1, 1, 1, 1, 1, 1, 1, 1]
#              }

n = 0
NoOf32bitChunks    = 0
lstBitsBitwiseAnd  = []
lstBitsModulo      = []
lstViaBitChunks    = []
lstBitsViaHex      = []
lstBitsViaMod      = []
lstBitsGMPY        = []
lstViaHexAndArray  = []
lstViaModAndArray  = []
timeBitwiseAnd     = -1
timeModulo         = -1
timeBitsViaHex     = -1
timeBitsViaMod     = -1
timeViaBitChunks   = -1
timeGMPY           = -1
timeViaHexAndArray = -1
timeViaModAndArray = -1

bitChunkSize      = 30 # must be <= 32

while timeBitwiseAnd <= timeBitsViaHex + 0.001:

   n = (n<<32) + 0x12345678

   NoOf32bitChunks += 1

   i = n
   lstBitsModulo = []
   start = time.clock()
   while i:
     lstBitsModulo.append(i%2)
     i=i>>1
   timeModulo = time.clock()-start

   i = n
   lstBitsBitwiseAnd = []
   start = time.clock()
   while i:
     lstBitsBitwiseAnd.append(i&0x01)
     i=i>>1
   timeBitwiseAnd = time.clock()-start

   i = n
   lstViaBitChunks = []
   bitFilter    = 0
   for dummy in range(bitChunkSize):
     bitFilter = (bitFilter<<1)+1
   #:for

   start = time.clock()
   done = False
   while i:
     i1 = int(i & bitFilter) # int() converts here a long-integer to integer
     i >>= bitChunkSize
     if i == 0: done = True
     for k in xrange(bitChunkSize):
       lstViaBitChunks.append(i1 & 1)
       i1 >>= 1
       if done and i1==0:
         # if this is the top word, and no bits are left, we are done
         break
       #:if
     #:for
   #:while
   timeViaBitChunks = time.clock()-start

   i = n
   # lstBitsViaHex = []
   start = time.clock()
   strHexOf_i = i.__hex__()[2:]
   if strHexOf_i[-1]=='L':
     strHexOf_i=strHexOf_i[0:-1]
   #:if
   intNoOfLeadingZeroBits = 0
   lstBitsOfFirstNibble = dctLstOfBitsVsCharOfNibble[strHexOf_i[0]]
   while not lstBitsOfFirstNibble[intNoOfLeadingZeroBits] and 
intNoOfLeadingZeroBits < 4:
     intNoOfLeadingZeroBits += 1
   #:while
   if intNoOfLeadingZeroBits == 4:
     lstBitsViaHex = []
   else:
     lstBitsViaHex = lstBitsOfFirstNibble[intNoOfLeadingZeroBits:]
   #:if
   for indxToStrHexOf_i in range(1,len(strHexOf_i)):
     lstBitsViaHex += 
dctLstOfBitsVsCharOfNibble[strHexOf_i[indxToStrHexOf_i]]
   #:for
   lstBitsViaHex.reverse()
   timeBitsViaHex = time.clock()-start

   i = n
   # lstBitsViaMod = []
   start = time.clock()
   strHexOf_i = '%X'.__mod__(i)
   intNoOfLeadingZeroBits = 0
   lstBitsOfFirstByte = dctLstOfBitsVsHexOfChar[strHexOf_i[0:2]]
   while not lstBitsOfFirstByte[intNoOfLeadingZeroBits] and 
intNoOfLeadingZeroBits < 8:
     intNoOfLeadingZeroBits += 1
   #:while
   if intNoOfLeadingZeroBits == 8:
     lstBitsViaMod = []
   else:
     lstBitsViaMod = lstBitsOfFirstByte[intNoOfLeadingZeroBits:]
   #:if
   for indxToStrHexOf_i in range(2,len(strHexOf_i),2):
     lstBitsViaMod += 
dctLstOfBitsVsHexOfChar[strHexOf_i[indxToStrHexOf_i:indxToStrHexOf_i+2]]
   #:for
   lstBitsViaMod.reverse()
   timeBitsViaMod = time.clock()-start

   i = n
   lstViaHexAndArray = []
   import array, binascii
   start = time.clock()
   strHexOf_i = i.__hex__()[2:]
   if strHexOf_i[-1]=='L':
     strHexOf_i=strHexOf_i[0:-1]
   #:if
   for item in array.array('B', binascii.unhexlify(strHexOf_i)):
     lstViaHexAndArray += dctLstOfBitsVsOrdOfChar[item]
   #:for
   intNoOfLeadingZeroBits = 0
   while not lstViaHexAndArray[intNoOfLeadingZeroBits] and 
intNoOfLeadingZeroBits < 4:
     intNoOfLeadingZeroBits += 1
   #:while
   if intNoOfLeadingZeroBits == 4:
     lstViaHexAndArray = []
   else:
     lstViaHexAndArray = lstViaHexAndArray[intNoOfLeadingZeroBits:]
   #:if
   lstViaHexAndArray.reverse()
   timeViaHexAndArray = time.clock()-start

   i = n
   lstViaModAndArray = []
   import array, binascii
   start = time.clock()
   strHexOf_i = '%X'.__mod__(i)
   for item in array.array('B', binascii.unhexlify(strHexOf_i)):
     lstViaModAndArray += dctLstOfBitsVsOrdOfChar[item]
   #:for
   intNoOfLeadingZeroBits = 0
   while not lstViaModAndArray[intNoOfLeadingZeroBits] and 
intNoOfLeadingZeroBits < 4:
     intNoOfLeadingZeroBits += 1
   #:while
   if intNoOfLeadingZeroBits == 4:
     lstViaModAndArray = []
   else:
     lstViaModAndArray = lstViaModAndArray[intNoOfLeadingZeroBits:]
   #:if
   lstViaModAndArray.reverse()
   timeViaModAndArray = time.clock()-start

   import gmpy
   lstBitsGMPY = []
   # start = time.clock()
   # i = gmpy.mpz(n)
   # f = gmpy.scan1(i,0)
   # if f==0:
   #   lstBitsGMPY.append(1)
   #   k = 1
   #   f = gmpy.scan1(i,1)
   # else:
   #   k = 0
   # while f:
   #   while k<f:
   #     lstBitsGMPY.append(0)
   #     k += 1
   #   lstBitsGMPY.append(1)
   #   k += 1
   #   f = gmpy.scan1(i,f+1)
   # timeGMPY = time.clock()-start
   # gmpy.binary(i) delivers base-256 little endian binary 
representation i.e. :
   #   gmpy.binary(0x12131415) == '\x15\x14\x13\x12'
   # so it can't be directly used here in the given context.
   start = time.clock()
   i = gmpy.mpz(n)
   strBitsOf_i = gmpy.digits(i,2)
   for char in strBitsOf_i:
     if char=='0':
       lstBitsGMPY.append(0)
     else:
       lstBitsGMPY.append(1)
   #:for
   lstBitsGMPY.reverse()
   timeGMPY = time.clock()-start

#:while

if(   lstBitsBitwiseAnd == lstBitsModulo
   and lstBitsBitwiseAnd == lstBitsViaHex
   and lstBitsBitwiseAnd == lstBitsViaMod
   and lstBitsBitwiseAnd == lstBitsViaHex
   and lstBitsBitwiseAnd == lstViaBitChunks
   and lstBitsBitwiseAnd == lstViaHexAndArray
   and lstBitsBitwiseAnd == lstViaModAndArray
   and lstBitsBitwiseAnd == lstBitsGMPY ):
   print
   print '  Seconds for bit extraction from integer with %i hex-digits 
when: '%(NoOf32bitChunks*8,)
   print
   print '      looping over i>>1;i&0x01     = %f '%(timeBitwiseAnd,)
   print '      ^-- but in %i bit chunks     = %f '%(bitChunkSize, 
timeViaBitChunks)
   print '      looping over i>>1;i%%2        = %f '%(timeModulo,)
   print '      using i.__hex__() repr.      = %f '%(timeBitsViaHex,)
   print '      using \'%%02X\'.__mod__(i)      = %f '%(timeBitsViaMod,)
   print '      using __hex__,array,binascii = %f '%(timeViaHexAndArray,)
   print '      using __mod__,array,binascii = %f '%(timeViaModAndArray,)
   print '      using gmpy.digits(i,2)       = %f '%(timeGMPY,)
   print
   print '  Bits extraction from long integers with number of 
hexadecimal digits '
   print '      > %i '%(NoOf32bitChunks*8,)
   print '  is at least 1 ms faster with i.__hex__() then with 
i>>1;i&0x01 method.'
   print
   print '  Modulo division (%2) is always slower than bitwise-and (&0x01).'



More information about the Python-list mailing list