[Tutor] Re:Base 207 compression algorithm

cino hilliard hillcino368@hotmail.com
Thu Jun 26 19:58:01 2003


This is a multi-part message in MIME format.

------=_NextPart_000_459d_2be6_204a
Content-Type: text/plain; format=flowed

Hi Jeff,
You are not understanding what my program does. Have you tried it? This bas 
converter is my
unique design allowing characters from ascii 48 - 255. So you will get ?? 
for 255  base 10 to base 16.
It is a program by Declaration.

Here is a simple run
>>>practical.testpi(10,207,100)
31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
Length before compression     100

╪ç?▌_àdr^GΓ■5█nxΓ7PGY÷⌡_¡╪ƒ└¥h:ò1╧rVuy5s║≡≡
Length after compression      43

31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
Length after converting back  100



>cino hilliard wrote:
>
>>To:Jeff Shannon and all other Python users
>>
>>Cino Hilliard wrote:
>>
>>>># we can determine the compression ratio for various bases. # Eg., base 
>>>>2 = 332%,
>>>># base 8 =111%, base 10 =100%, base 100 = 50%, base 207 = 43.2%.
>>>
>>>
>>>
>>Jeff Shannon wrote:
>>
>>>This is mistaken, because you're only changing how many characters are 
>>>used to display it on the screen.  No matter what base it's displayed in, 
>>>it is still *stored* in binary, and this will not compress anything. 
>>>Whether you see 'FF' or '255' or '@' (or whatever other character might 
>>>be used to represent that number in whatever base you try to use), it 
>>>still must occupy one byte of memory / hard drive space.  Once again, 
>>>you're confusing the display with internals.  Changing the base only 
>>>affects display.
>>
>>
>>Does anyone else in the python community agree with this?
>>
>>
>>Attached is a script that can be used to zip numbers. It is a base 2-207 
>>compression algorithm that I
>>developed using python's arbitrary integer precision feature. Base 207 was 
>>used for the output in this
>>example.
>>
>>Here is the output for 5000 digits of Pi stored to two files pib10.txt for 
>>the decimal expansion and
>>pib207.txt for the base 207 conversion. I included Win Zip and pkzip files 
>>also. You will notice that the
>>compression in base 207 gives a better ratio than the Zip file 
>>compression. The compression ratio
>>improves logorithmically with increasing bases.
>
>
>That is because you are comparing the size of strings used to represent a 
>number, rather than demonstrating any real compression.
How do you get the size of s = 12345678987654321 in bytes?
len(str(s)) = 17. Is that correct? how about the size of
pi=31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
len(str(s)) = 17. Is that correct?

>However, you're not going to have any luck in actually doing any math with 
>either of these strings.
Sure you can. You convert back to decimal. The whole idea is to be able to 
store a string of a very large
numbers in in as little disk space as possible.
Since python is slow running this algorithm, for very large numbers you 
would be wise to do 1000 digit
chunks.

In either case, you'd have >better "compression" by simply expressing Pi as 
a large binary float

show me Pi to 100 places as a large binary float.
My program outputs the following for pi to 100 places in binary.
>>>practical.testpi(10,2,100)
31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
Length before compression     100

10110111110110010101100000011001010001011110001101001110010111001111011011110000
00000011100000111001101101001011001110001010011111001111000000011111101100011100
10101111101111010010101101101100100100111101010010001111110011000101011110101001
10000111011100110100100010111011011101000010100101000110111110111101110111101100
10011001011
Length after compression      331

31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
Length after converting back  100

>-- you could probably get that much precision in less than a hundred bytes 
>(probably *much* less), compared to your 2100.
Show me for just 100 digits.
So you admit I have reduced the file size of 5000 digits of pi to 2100 bytes 
of which I could
read back into my program and convert back to 5000 digits decimal?
  (I'm not about  to try to do the math to determine how many floating-point 
bits

What are you talking about? Floating point goes to 16 digits or so
>>>355/113.
3.1415929203539825

>would be needed to achieve 5000 digits of precision, but I'm quite 
>confident that it's at *least* an order of magnitude less than your

Show it to me for just 100 places if you say it is < 43 bytes as my 
algorithm does.

>string representation.  Of course, note that '5000 digits of precision' 
>inherently implies that one is speaking of a particular
Yes we are going from base 10 to 207
>base, which we presume to be decimal...)
>
>The real issue here, of course, is that this is not truly compression in 
>the general sense, because it only applies to string representations of 
>numbers.

So? I admitted it is only good for numbers (actually integers).

>If you really want to show compression, then take an arbitrary string (say, 
>the contents of your Windows' autoexec.bat or the contents of your *nix 
>/etc/passwd, or any generic logfile) and show how that can be expressed in 
>fewer

Hello. Why can't I compress strings of numbers? In the world what is there 
are a lot of numbers.
The latest record for Pi is 1.24 trillion digits. I will bet these digits 
are compressed and called from a decompressor.

>characters by converting to a higher-radix number. Don't forget to show how 
>to re-create the original file afterwards.
Yes.
>
>And of course, even as far as representing numbers, this is not good 
>compression, because expressing a binary integer or float will take
Show me for 
pi=31415926535897932384626433832795028841971693993751058209749445923078164062862089
98628034825342117067
I want to see the floating point or binary integer for this number.

Do you have a better compression scheme for numbers in python?

>much less space than representing a string of characters in any base.  For 
>a computer (where each character is represented by a 1-byte value), it is 
>not possible to have more characters that can be represented by a single 
>byte than there are integers that can be

Not so. Indeed, this is how compression algorithms work

>expressed by that single  example, this is thebyte. If you look at 
>sys.maxint as an largest number that can be expressed in a (four-byte) 
>integer.  But it takes ten characters to represent it in decimal, and it'd 
>take 6 or 7 in base 207. You could claim that it
No. my algorithm takes 5. for small numbers the log(10)/log(b) is not 
accurate. Here we get a 50%
compression ratio.
maxint = 2^31 - 1
import sys
>>>sys.maxint
2147483647
Now my algorithm gives
>>>practical.testany(10,207,2**31-1)
2147483647
Length before compression     10

1SGÆL
Length after compression      5

2147483647
Length after converting back  10
>>>
for much larger numbers the 43% rule kicks in as this example shows.
>>>practical.testany(10,207,2**256-1)
115792089237316195423570985008687907853269984665640564039457584007913129639935
Length before compression     78

4t&#9557;&#9474;&#9560;ylm\&#9564;&#9612;â{ú2&#9561;&#945;>T&#931;l;&#966;L&#9516;&#9492;ÅwH&#9567;&#9556;fÿ&#9578;
Length after compression      34

115792089237316195423570985008687907853269984665640564039457584007913129639935
Length after converting back  78
>>>
4t&#9557;&#9474;&#9560;ylm\&#9564;&#9612;â{ú2&#9561;&#945;>T&#931;l;&#966;L&#9516;&#9492;ÅwH&#9567;&#9556;fÿ&#9578;
copy and paste this ugly string into your >>> session where you imported 
practical
practical.testany(207,10,'4t&#9557;&#9474;&#9560;ylm\&#9564;&#9612;â{ú2&#9561;&#945;>T&#931;l;&#966;L&#9516;&#9492;ÅwH&#9567;&#9556;fÿ&#9578;')
You will get
4t&#9557;&#9474;&#9560;ylm\&#9564;&#9612;â{ú2&#9561;&#945;>T&#931;l;&#966;L&#9516;&#9492;ÅwH&#9567;&#9556;fÿ&#9578;
Length before compression     34

115792089237316195423570985008687907853269984665640564039457584007913129639935
Length after compression      78

4t&#9557;&#9474;&#9560;ylm\&#9564;&#9612;â{ú2&#9561;&#945;>T&#931;l;&#966;L&#9516;&#9492;ÅwH&#9567;&#9556;fÿ&#9578;
Length after converting back  34


>only takes four bytes in base 256 -- but at that point, the computer 
>representation is *exactly* the same as for a binary integer.  And you 
>cannot improve this on a standard computer by using a base higher than 256, 
>because each character will require multiple bytes, so your 'three 
>character' string in base 1000 (and who could keep track of a thousand 
>different symbols, anyhow?) will require at least six bytes to represent 
>it.
>
>Jeff Shannon
>Technician/Programmer
>Credit International
>
>

_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail

------=_NextPart_000_459d_2be6_204a
Content-Type: text/plain; name="practical.py"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="practical.py"

#                     A practical application of base conversion.
#                                 By Cino Hilliard
#                                    6/24/2003
# This little program demonstrates a practical use of base conversion to 
compress
# base 10 numbers using the ascii set 48 - 255 allowing bases 2 - 207. With 
a little work,
# it can be changed to compress text also. Using the testpi function for 
1000 digits,
# we can determine the compression ratio for various bases. # Eg., base 2 = 
332%,
# base 8 =111%, base 10 =100%, base 100 = 50%, base 207 = 43.2%.
# Perhaps others in the list can tweek to get better compression. It may be 
possible
# use another character set to say super ascii 511. Processing gets slow as 
we increase
# the number of digits to say 10000. This may be improved by doing 1000 
characters at a
# time getting 10 packets of base 207 to be converted back 1000 at a time. 
Also this could
# be used as an encryption scheme for sensitive data. If you are a Mystic, 
you cal look
# for words or messages in the characters of Pi. Go out far enough and you 
will read the
# Bible word for word. with this. You will have to place the spaces and 
punctuation in
# though. Prime number enthusiasts can use the base converter to find prime 
words or
# phrases.
def testany(r1,r2,n):
    f1 = open('any10.txt','w')
    f2 = open('any207.txt','w')
    print n
    n=str(n)
    print "Length before compression    ", len(n)
    print ""
    x = base(r1,r2,n)
    print x
    print "Length after compression     ", len(x)
    print ""
    y = base(r2,r1,x)
    print y
    print "Length after converting back ", len(y)
    f1.write(n)
    f2.write(x)
    f1.close()
    f2.close()

def testpi(r1,r2,n):
    f1 = open('pib10.txt','w')
    f2 = open('pib207.txt','w')
    pi = piasn(n)
    print pi
    print "Length before compression    ", len(pi)
    print ""
    x = base(r1,r2,pi)
    print x
    print "Length after compression     ", len(x)
    print ""
    y = base(r2,r1,x)
    print y
    print "Length after converting back ", len(y)
    f1.write(pi)
    f2.write(x)
    f1.close()
    f2.close()

def testfact(r1,r2,n):
    f1 = open('fact10.txt','w')
    f2 = open('fact207.txt','w')
    fact1 = fact(n)
    print fact1
    print ""
    x = base(r1,r2,fact1)
    print x
    print len(x)
    y = base(r2,r1,x)
    print y
    f1.write(fact1)
    f2.write(x)
    f1.close()
    f2.close()

def base(r1,r2,num):
    import math
    digits=""
    for j in range(48,255):
          digits = digits + chr(j)
    num = str(num)
    ln  = len(num)
    dec = 0
    for j in range(ln):
          asci = ord(num[j])
          temp   = r1**(ln-j-1)
          ascii2 = asci-48
          dec += ascii2*temp
    RDX = ""
    PWR = math.log(dec)/math.log(r2)
    j = int(PWR)
    while j >= 0:
          Q   = dec/(r2**j)
          dec = dec%(r2**j)
          RDX = RDX + digits[Q]
          j-=1
    return RDX

def pix(n):  # Compute the digits of Pi
    n1 = n*34/10
    m = n+5
    p=d =10**m
    k=1
    while k < n1:
           d = d*k/(k+k+1)
           p = p+d
           k+=1
    p*=2
    p=str(p)
    return p[:n]

def piasn(n):  # My faster version to compute Pi.
    n1=n*7/2+5
    n2=n/2 + 5
    m=n+5
    p=x=10**(m)
    d =1
    while d <= n1:
         x=x*d/(d+1)/4
         p=p+x/(d+2)
         d += 2
    p*=3
    p=str(p)
    return p[:n]

def fact(n):
    f = j = 1
    while j <= n:
            f*=j
            j+=1
    f=str(f)
    return f



------=_NextPart_000_459d_2be6_204a--