[Tutor] Re:Base 207 compression algorithm

Jeff Shannon jeff@ccvcorp.com
Thu Jun 26 20:40:02 2003


cino hilliard wrote:

> Hi Jeff,
> You are not understanding what my program does. Have you tried it? 
> This bas converter is my
> unique design allowing characters from ascii 48 - 255. So you will get 
> ?? for 255  base 10 to base 16.
> It is a program by Declaration. 


I understand very well what your program does.  You are, however, 
ascribing far more magic to unusual numerical bases than they actually 
possess.  Perhaps if you were to spend a bit of time studying assembly 
language, you'd get a better feel for what's going on here.  I don't 
advocate actually using assembly language to program anything, but some 
exposure to it will give you a much better idea of how your computer 
actually works, even if you just look at 8086 assembler.  For that 
matter, a good exposition of C's variable types and how they work would 
probably benefit you greatly.  You seem to have no grasp of the 
distinction between an integer and a character, and I've tried every 
explanation I can think of with no effect.

> How do you get the size of s = 12345678987654321 in bytes?
> len(str(s)) = 17. Is that correct?


No, it's not.  That's the length of the string of decimal digits that 
represents s, which should be obvious since you explicitly convert s 
into a string before taking its length.  It is *not* the size of s in 
bytes, because s is a (long) integer.  I don't know the details of 
Python longs well enough to calculate the number of bytes that that 
particular number will require;  I do know that every number up to 
sys.maxint (2147483647, or 0x7fffffff -- the high bit is reserved as a 
sign bit) is represented using a C long, i.e. four bytes.  I suspect 
that a Python long representing s, above will require either 8 or 12 bytes.

> how about the size of
> pi=31415926535897932384626433832795028841971693993751058209749445923078164062862089 
>
> 98628034825342117067
> len(str(s)) = 17. Is that correct?
>
>> However, you're not going to have any luck in actually doing any math 
>> with either of these strings.
>
> Sure you can. You convert back to decimal. 


No, because your computer can't do math on a string of decimal digits. 
 It needs to convert that into a binary number somehow before it can do 
math.  And it can *store* it as a binary number a lot more efficiently 
than it can store it as a string of digit characters, no matter what 
encoding scheme you use for those characters.

Like I said, learn how your computer works at the level of registers, 
and how the floating-point unit operates, and you'll understand this a 
bit better.

>> -- you could probably get that much precision in less than a hundred 
>> bytes (probably *much* less), compared to your 2100.
>
> Show me for just 100 digits.
> So you admit I have reduced the file size of 5000 digits of pi to 2100 
> bytes of which I could
> read back into my program and convert back to 5000 digits decimal?
>  (I'm not about  to try to do the math to determine how many 
> floating-point bits
>
> What are you talking about? Floating point goes to 16 digits or so
>
>>>> 355/113.
>>>
> 3.1415929203539825 


Read up on the mathematics behind floating point numbers -- a decent C 
compiler reference should have a fair bit in it about how the compiler 
implements floats.  You're *still* mistaking the representation that 
Python is showing you for the number itself.  I don't remember specifics 
on the numbers of significant digits that are expressible with a 
standard C float or double (I believe that Python floats are implemented 
using C doubles), but that has *nothing* to do with how many digits 
Python shows you.

>> If you really want to show compression, then take an arbitrary string 
>> (say, the contents of your Windows' autoexec.bat or the contents of 
>> your *nix /etc/passwd, or any generic logfile) and show how that can 
>> be expressed in fewer
>
>
> Hello. Why can't I compress strings of numbers? In the world what is 
> there are a lot of numbers.
> The latest record for Pi is 1.24 trillion digits. I will bet these 
> digits are compressed and called from a decompressor. 


Sure, you can "compress" strings of numbers, but if you want to do so in 
a way that is reversible, you'll essentially have to encode each byte 
(which is a number from 0-255) as a separate number, and there is *no 
way* that a computer can represent a unique byte in less than one byte. 
 Compression algorithms are tricky things -- they look for patterns in 
the arrangement of bytes, and then describe those patterns.  This is a 
far more complex task than simply converting a number into a different 
radix.  

And I bet that calculations of Pi *don't* use compression, except 
possibly to store the final result.  But calculations of Pi are a rather 
specialized thing, and I can't recall any program that I've written that 
needed to do that for a practical reason.  For almost all of those, 
math.pi (3.14159265359) is close enough, and if it's not, then I need 
far more precise mathematical capabilities than what I'll get using 
standard Python (or C) math routines.  

At this point, I see no reason to continue this discussion.  I've tried 
explaining, as clearly as I can without breaking out the technical 
manuals, how your computer handles numbers.  Obviously, my explanations 
aren't getting through to you.  I can do no more, so I will not be 
replying further in this thread.

Jeff Shannon
Technician/Programmer
Credit International