Does Python allow access to some of the implementation details?
Claudio Grondi
claudio.grondi at freenet.de
Sat Jan 7 19:53:24 EST 2006
Bengt Richter wrote:
> On Sat, 07 Jan 2006 14:05:18 +0100, Claudio Grondi <claudio.grondi at freenet.de> wrote:
> [...]
>
>>What I am also looking for is a conversion to base 256 (i.e where the
>>full byte is used and the string and the integer have the same actual
>>content if on appropriate endian machine), which would make the bit
>>extraction comparable easy and effective as the i.__hex__() based method.
>>Using digits() instead of the code you have provided speeds the whole
>>thing up two times (see attached code for details), but still is
>>i.__hex__() the best way to go and could be improved probably only by
>>direct conversion to base 256 or even higher base as e.g. 2**16 or 2**32.
>>
>
> (see Paul Rubin's post first).
> BTW, I'd bet that '%x'.__mod__(i) will save you time over i.__hex__() by the time
> you finish fiddling with 0x and L, even if you don't use the rest.
Yes, it seem to do, but not very significant, at least in case of the
code provided (see attachment). Here the output of run of provided code
on my Pentium 4, 3.0 GHz :
Seconds for bit extraction from integer with 280 hex-digits when:
looping over i>>1;i&0x01 = 0.001207
^-- but in 30 bit chunks = 0.000904
looping over i>>1;i%2 = 0.002409
using i.__hex__() repr. = 0.000186
using '%02X'.__mod__(i) = 0.000145
using __hex__,array,binascii = 0.000117
using __mod__,array,binascii = 0.000114
using gmpy.digits(i,2) = 0.000615
>
> The question looming is, "What are you intending to do with your new representations
> of twos complement integers?"
As you see in the 'Subject' I am currently not primary after conversion
of integers, but after understanding of Python implementation. The code
posted is a kind of side effect required in order to be able to
demonstrate progress if any.
What I have lately learned from this and some other threads in
(de.)comp.lang.python is, that to be able to get the best out of Python
in terms of speed and understanding what is or is not possible, it is
vital and necessary to understand the implementation.
Another reason behind this all is, that I was so impressed as it had
turned out, that it was possible to get the CPU time required for
conversion of the largest known prime to decimal form down from six
hours to six seconds (using BigDec), that I am now step by step checking
if similar effects can't be achieved also elsewhere.
>
> BTW2, an easy way to play with integers chopped into little-endian bit field sequences is
> a generator, e.g., (note that last field is sign bits, either all zeroes or all ones, so
> you can unambiguously reconstruct a signed integer from this variable width representation).
> (not very tested, and BTW as you see I hacked in using int when possible for return values)
>
> >>> def bitsof(n, width=8):
> ... m = (2**(width-1)-1)*2+1 # allow 2**width-1 == sys.maxint as int
> ... if type(m) is long:
> ... yield n&m
> ... while n>0 or n<-1:
> ... n>>=width
> ... yield n&m
> ... else:
> ... yield int(n&m)
> ... while n>0 or n<-1:
> ... n>>=width
> ... yield int(n&m)
> ...
> >>> list(bitsof(11,1))
> [1, 1, 0, 1, 0]
> >>> list(bitsof(5,1))
> [1, 0, 1, 0]
> >>> list(bitsof(-5,1))
> [1, 1, 0, 1]
> >>> hex(3**100)
> '0x5A4653CA673768565B41F775D6947D55CF3813D1L'
> >>> ''.join(map('%02X'.__mod__, bitsof(3**100, 8))[::-1])
> '005A4653CA673768565B41F775D6947D55CF3813D1'
> >>> ''.join(map('%02X'.__mod__, bitsof(-3**100, 8))[::-1])
> 'FFA5B9AC3598C897A9A4BE088A296B82AA30C7EC2F'
> >>> hex(-3**100)
> '-0x5A4653CA673768565B41F775D6947D55CF3813D1L'
>
> <gentle rant>
> (I leave it to your judgement as to how useful our current hex() representation
> of negative numebers is ;-)
> </gentle rant>
>
> Regards,
> Bengt Richter
I haven't yet run into negative integers, so thank's for making me aware
of the potential problems with it. Yes, I would not expect the hex() to
give me the minus sign - I see I should be very careful here (this is
also the reason why I generally try to avoid usage of negative integers
as good as I can).
Claudio
P.S. Attachment:
import time
# dctLstOfBitsVsCharOfNibble is a dictionary with a key beeing one
character
# string representing hexadecimal value of a nibble and a value beeing a
list
# of bits of the nibble where the lowest bit is stored at index 0 of the
list.
# i.e.
dctLstOfBitsVsCharOfNibble = {
'0':[0, 0, 0, 0],
'1':[0, 0, 0, 1],
'2':[0, 0, 1, 0],
'3':[0, 0, 1, 1],
'4':[0, 1, 0, 0],
'5':[0, 1, 0, 1],
'6':[0, 1, 1, 0],
'7':[0, 1, 1, 1],
'8':[1, 0, 0, 0],
'9':[1, 0, 0, 1],
'A':[1, 0, 1, 0],
'B':[1, 0, 1, 1],
'C':[1, 1, 0, 0],
'D':[1, 1, 0, 1],
'E':[1, 1, 1, 0],
'F':[1, 1, 1, 1]
}
# The dctLstOfBitsVsCharOfNibble dictionary can be also generated by
following code:
# dctLstOfBitsVsCharOfNibble = {}
# for intNibbleValue in range(0, 16):
# lstBitReprOfCurrNibble=[]
# for indxOfBit in range(0,4):
# lstBitReprOfCurrNibble.append(intNibbleValue>>indxOfBit&0x01)
# #:for
# lstBitReprOfCurrNibble.reverse()
#
dctLstOfBitsVsCharOfNibble['%X'%(intNibbleValue,)]=lstBitReprOfCurrNibble
# #:for
dctLstOfBitsVsOrdOfChar = {}
for ordOfChar in range(256):
dctLstOfBitsVsOrdOfChar[ordOfChar] =
dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar>>4,)] +
dctLstOfBitsVsCharOfNibble['%X'%(ordOfChar&0x0F,)]
#:for
# what gives: {
# 0x00:[0, 0, 0, 0, 0, 0, 0, 0],
# 0x01:[0, 0, 0, 0, 0, 0, 0, 1],
# 0x02:[0, 0, 0, 0, 0, 0, 1, 0],
# ...
# 0xFD:[1, 1, 1, 1, 1, 1, 0, 1],
# 0xFE:[1, 1, 1, 1, 1, 1, 1, 0],
# 0xFF:[1, 1, 1, 1, 1, 1, 1, 1]
# }
dctLstOfBitsVsHexOfChar = {}
for ordOfChar in range(256):
dctLstOfBitsVsHexOfChar['%02X'%ordOfChar] =
dctLstOfBitsVsOrdOfChar[ordOfChar]
#:for
# what gives: {
# '00':[0, 0, 0, 0, 0, 0, 0, 0],
# '01':[0, 0, 0, 0, 0, 0, 0, 1],
# '02':[0, 0, 0, 0, 0, 0, 1, 0],
# ...
# 'FD':[1, 1, 1, 1, 1, 1, 0, 1],
# 'FE':[1, 1, 1, 1, 1, 1, 1, 0],
# 'FF':[1, 1, 1, 1, 1, 1, 1, 1]
# }
n = 0
NoOf32bitChunks = 0
lstBitsBitwiseAnd = []
lstBitsModulo = []
lstViaBitChunks = []
lstBitsViaHex = []
lstBitsViaMod = []
lstBitsGMPY = []
lstViaHexAndArray = []
lstViaModAndArray = []
timeBitwiseAnd = -1
timeModulo = -1
timeBitsViaHex = -1
timeBitsViaMod = -1
timeViaBitChunks = -1
timeGMPY = -1
timeViaHexAndArray = -1
timeViaModAndArray = -1
bitChunkSize = 30 # must be <= 32
while timeBitwiseAnd <= timeBitsViaHex + 0.001:
n = (n<<32) + 0x12345678
NoOf32bitChunks += 1
i = n
lstBitsModulo = []
start = time.clock()
while i:
lstBitsModulo.append(i%2)
i=i>>1
timeModulo = time.clock()-start
i = n
lstBitsBitwiseAnd = []
start = time.clock()
while i:
lstBitsBitwiseAnd.append(i&0x01)
i=i>>1
timeBitwiseAnd = time.clock()-start
i = n
lstViaBitChunks = []
bitFilter = 0
for dummy in range(bitChunkSize):
bitFilter = (bitFilter<<1)+1
#:for
start = time.clock()
done = False
while i:
i1 = int(i & bitFilter) # int() converts here a long-integer to integer
i >>= bitChunkSize
if i == 0: done = True
for k in xrange(bitChunkSize):
lstViaBitChunks.append(i1 & 1)
i1 >>= 1
if done and i1==0:
# if this is the top word, and no bits are left, we are done
break
#:if
#:for
#:while
timeViaBitChunks = time.clock()-start
i = n
# lstBitsViaHex = []
start = time.clock()
strHexOf_i = i.__hex__()[2:]
if strHexOf_i[-1]=='L':
strHexOf_i=strHexOf_i[0:-1]
#:if
intNoOfLeadingZeroBits = 0
lstBitsOfFirstNibble = dctLstOfBitsVsCharOfNibble[strHexOf_i[0]]
while not lstBitsOfFirstNibble[intNoOfLeadingZeroBits] and
intNoOfLeadingZeroBits < 4:
intNoOfLeadingZeroBits += 1
#:while
if intNoOfLeadingZeroBits == 4:
lstBitsViaHex = []
else:
lstBitsViaHex = lstBitsOfFirstNibble[intNoOfLeadingZeroBits:]
#:if
for indxToStrHexOf_i in range(1,len(strHexOf_i)):
lstBitsViaHex +=
dctLstOfBitsVsCharOfNibble[strHexOf_i[indxToStrHexOf_i]]
#:for
lstBitsViaHex.reverse()
timeBitsViaHex = time.clock()-start
i = n
# lstBitsViaMod = []
start = time.clock()
strHexOf_i = '%X'.__mod__(i)
intNoOfLeadingZeroBits = 0
lstBitsOfFirstByte = dctLstOfBitsVsHexOfChar[strHexOf_i[0:2]]
while not lstBitsOfFirstByte[intNoOfLeadingZeroBits] and
intNoOfLeadingZeroBits < 8:
intNoOfLeadingZeroBits += 1
#:while
if intNoOfLeadingZeroBits == 8:
lstBitsViaMod = []
else:
lstBitsViaMod = lstBitsOfFirstByte[intNoOfLeadingZeroBits:]
#:if
for indxToStrHexOf_i in range(2,len(strHexOf_i),2):
lstBitsViaMod +=
dctLstOfBitsVsHexOfChar[strHexOf_i[indxToStrHexOf_i:indxToStrHexOf_i+2]]
#:for
lstBitsViaMod.reverse()
timeBitsViaMod = time.clock()-start
i = n
lstViaHexAndArray = []
import array, binascii
start = time.clock()
strHexOf_i = i.__hex__()[2:]
if strHexOf_i[-1]=='L':
strHexOf_i=strHexOf_i[0:-1]
#:if
for item in array.array('B', binascii.unhexlify(strHexOf_i)):
lstViaHexAndArray += dctLstOfBitsVsOrdOfChar[item]
#:for
intNoOfLeadingZeroBits = 0
while not lstViaHexAndArray[intNoOfLeadingZeroBits] and
intNoOfLeadingZeroBits < 4:
intNoOfLeadingZeroBits += 1
#:while
if intNoOfLeadingZeroBits == 4:
lstViaHexAndArray = []
else:
lstViaHexAndArray = lstViaHexAndArray[intNoOfLeadingZeroBits:]
#:if
lstViaHexAndArray.reverse()
timeViaHexAndArray = time.clock()-start
i = n
lstViaModAndArray = []
import array, binascii
start = time.clock()
strHexOf_i = '%X'.__mod__(i)
for item in array.array('B', binascii.unhexlify(strHexOf_i)):
lstViaModAndArray += dctLstOfBitsVsOrdOfChar[item]
#:for
intNoOfLeadingZeroBits = 0
while not lstViaModAndArray[intNoOfLeadingZeroBits] and
intNoOfLeadingZeroBits < 4:
intNoOfLeadingZeroBits += 1
#:while
if intNoOfLeadingZeroBits == 4:
lstViaModAndArray = []
else:
lstViaModAndArray = lstViaModAndArray[intNoOfLeadingZeroBits:]
#:if
lstViaModAndArray.reverse()
timeViaModAndArray = time.clock()-start
import gmpy
lstBitsGMPY = []
# start = time.clock()
# i = gmpy.mpz(n)
# f = gmpy.scan1(i,0)
# if f==0:
# lstBitsGMPY.append(1)
# k = 1
# f = gmpy.scan1(i,1)
# else:
# k = 0
# while f:
# while k<f:
# lstBitsGMPY.append(0)
# k += 1
# lstBitsGMPY.append(1)
# k += 1
# f = gmpy.scan1(i,f+1)
# timeGMPY = time.clock()-start
# gmpy.binary(i) delivers base-256 little endian binary
representation i.e. :
# gmpy.binary(0x12131415) == '\x15\x14\x13\x12'
# so it can't be directly used here in the given context.
start = time.clock()
i = gmpy.mpz(n)
strBitsOf_i = gmpy.digits(i,2)
for char in strBitsOf_i:
if char=='0':
lstBitsGMPY.append(0)
else:
lstBitsGMPY.append(1)
#:for
lstBitsGMPY.reverse()
timeGMPY = time.clock()-start
#:while
if( lstBitsBitwiseAnd == lstBitsModulo
and lstBitsBitwiseAnd == lstBitsViaHex
and lstBitsBitwiseAnd == lstBitsViaMod
and lstBitsBitwiseAnd == lstBitsViaHex
and lstBitsBitwiseAnd == lstViaBitChunks
and lstBitsBitwiseAnd == lstViaHexAndArray
and lstBitsBitwiseAnd == lstViaModAndArray
and lstBitsBitwiseAnd == lstBitsGMPY ):
print
print ' Seconds for bit extraction from integer with %i hex-digits
when: '%(NoOf32bitChunks*8,)
print
print ' looping over i>>1;i&0x01 = %f '%(timeBitwiseAnd,)
print ' ^-- but in %i bit chunks = %f '%(bitChunkSize,
timeViaBitChunks)
print ' looping over i>>1;i%%2 = %f '%(timeModulo,)
print ' using i.__hex__() repr. = %f '%(timeBitsViaHex,)
print ' using \'%%02X\'.__mod__(i) = %f '%(timeBitsViaMod,)
print ' using __hex__,array,binascii = %f '%(timeViaHexAndArray,)
print ' using __mod__,array,binascii = %f '%(timeViaModAndArray,)
print ' using gmpy.digits(i,2) = %f '%(timeGMPY,)
print
print ' Bits extraction from long integers with number of
hexadecimal digits '
print ' > %i '%(NoOf32bitChunks*8,)
print ' is at least 1 ms faster with i.__hex__() then with
i>>1;i&0x01 method.'
print
print ' Modulo division (%2) is always slower than bitwise-and (&0x01).'
More information about the Python-list
mailing list