Dumb python questions

Tue Aug 21 16:33:09 EDT 2001

On 19 Aug 2001 00:25:07 -0700, Paul Rubin <phr-n2001 at nightsong.com> wrote:

>bokr at accessone.com (Bengt Richter) writes:
>> >Here's one: say I have a long int, like x = 7**77777L.  How can I tell
>> >how many bits it takes to express that number in binary?  How can I
>> >convert it into a packed character string (i.e. represent it in string
>> >form with 8 bits/char so I can write it to a file)?  The answers I can
>> >think of involve weird kludges and/or are unacceptably slow.
>> >
>> >Thanks.
>> Brute force, but easy to understand:
>>  >>> x = 7**77777L
>>  >>> cl = []
>>  >>> while x:
>>  ...     cl += chr(x&0xff)
>>  ...     x = x>>8
>>  ...
>>  >>> s = ''.join(cl)
>> 
>> Which gives you a little-endian result.
>
>This is too slow to be reasonable.
>
>> To get the string length required quickly, we can use
>> the builtin hex conversion which is very fast.
>> Note the '0x' in front and 'L' at the end (3 extra)
>>  >>> hex(x)[:5]
>>  '0xC7F'
>
>Thanks, I didn't know about the hex function.  I tried ('%x' % x) and
>it gave me an overflow error message (Python 1.5.2).  It works in 2.1.1.
>
>To convert to binary, I think using hex(x) on the long int and then
>using  the binascii package to convert hex to binary is likely
>to be faster than the division loop above.
>
Definitely.

>> I grant you, the while loop takes a few seconds, and so does the
>> 7**77777L.  How fast do you need? We could write faster algorithms
>> for the exponentiations, and also the conversion to string, both in
>> Python. We could also obviously write a fast extension to do it in
>> C/C++ and return a string of 8-bit chars as fast as the hex()
>> function.  How fast do you need?
>
>I just think something is terribly wrong if converting the number
>to a binary representation (which it's already in, after all) is
>much slower than computing the number in the first place.  How
As Tim pointed out, marshal provides a fast way to serialize the
internal representation of long (and other things). If you can
live with that representation (a little-endian string made from
the internal int chunk count and the following n chunks of two
bytes each holding a little-endian short 15-bit number), it's
going to be the fastest. If you need all the bits of the number packed
8bits/byte, you'll have to repack either the marshal.dumps() or hex()
format. What are you going to do with the string buffer once you have it?
BTW, the marshaled long format is apparently platform-dependent, whereas
a packing into a string might not be, depending on how you use it.

>would you do either the exponentiation or the conversion faster
>in Python?
> 
It seems like a contradiction, but sometimes when numbers are huge
and time of internal processes grows fast (n^2 or more) with size,
you can beat it with an algorithm that computes the end result in
a different way. Being Python, obviously a new algorithm also uses
the internal processes, so it's a matter of coming up with a new
combination with less overall cost. For small numbers the logic of
a new algorithm is typically going to cost much more than it gains.

>The numbers I actually want to use are closer in magnitude to 7**777
>rather than 7**77777, if that matters.  I'd prefer to avoid using
It does matter. I couldn't beat the internal 7L**777 but I could beat
7L**77777 by 10-15%, which proves a point, but is not worth while.

>a C library.  If I use one, gmpy should do the job.  I've timed
>Python's arithmetic as around 10x slower than gmpy, which is
>quite respectable since gmpy's inner loops are carefully tuned
>assembly code, and this kind of arithmetic is hard to do efficiently
>in portable C.  Python's arithmetic is around 100x faster than
>perl's Math::BigInt.

I think a C-implemented object that lets you view and manipulate a chunk
of memory as an ordered bit list would solve a lot of problems like yours.
Assuming efficient obvious conversions from other types and back, these
things could be easy. The same mechanisms would also support a set type
pretty naturally. I'd anticipate something like
    bl = bitlist(7L**777)
    sbuf = bl.bytes()
where bytes() could be a utility method for doing quickly what
    sbuf = ''.join(map(chr,[int(bl[i:i+8]) for i in xrange(0,len(bl),8)])))
might do the long way. Maybe it could be the default str() representation.

If there weren't an internal conversion from long to bitlist, but only from string
bytes to n*8-bit bitlist, then you could access the marshalled long something like

    mbl = bitlist(marshal.dumps(7L**777)[5:])    #'l\x92\x00\x00\x00Gwx... => Gwx...
    bl = bitlist()	# empty bit list to accumulate 15-bit chunks from long
    for i in xrange(0,len(mbl),16):              # advance by pairs of bytes
        bl += mbl[i:i+8]       # first byte has 8 bits
        bl += mbl[i+8:i+15]    # 7 bits for 15 total

and then convert the final result to a string buffer as before
    sbuf = bl.bytes()
or
    sbuf = ''.join(map(chr,[int(bl[i:i+8]) for i in xrange(0,len(bl),8)])))

Note that a bit list can be viewed as an abstract type. It just happens to
have a convenient little-endian implementation.

To repack the marshal.dumps string currently, you could use
the following (which effectively uses the absolute value, BTW)

It's messy enough that something based on hex(), reverse, etc.
will probably be faster. (Yup, just did one, and it is over twice as
fast (about 2.25x) for 7L**777 :)
______________________
import string
tt = string.maketrans('0123456789ABCDEF',''.join([chr(x) for x in xrange(16)]))
def hexpack(n):
    h = map(ord,string.translate(hex(long(n))[2:-1],tt))
    h.reverse()
    if 1&len(h): h.append(0)
    return ''.join([chr(h[i+1]<<4|h[i]) for i in xrange(0,len(h),2)])
______________________

import marshal
def longpack(n):
    if not n: return '\x00'
    sm = marshal.dumps(long(n))
    print `sm`
    bacc=0
    bits=0
    bl = []
    for i in xrange(5,len(sm)):
        bacc |= (ord(sm[i])<<bits)
        if i&1:
            bits +=8
        else:
            bits +=7
        while bits>=15:
            bl.append(chr(bacc&0xff))
            bacc >>= 8
            bits -= 8
    while bits>0 and bacc:
        bl.append(chr(bacc&0xff))
        bacc >>= 8
        bits -= 8
    return ''.join(bl)
_________________________