[Tutor] built in functions int(),long()+convert.base(r1,r2,num)

Wed Jun 25 21:20:03 2003

cino hilliard wrote:

> Are these statements  true or false.
>
> 1. Maybe any integer in the range [2, 36], or zero.
>
> 2. If radix is zero, the proper radix is guessed based on the contents 
> of string;
>
> The first is true. What about 2.? 

The second is true, though definitions of "proper" are subjective.  It 
would, perhaps, be more technically correct to say "one of three 
possible radixes (octal, decimal, hexidecimal) is selected based on the 
contents of the string."  Considering the overwhelming lack of interest 
by 99.99% of programmers in bothering with anything other than those 
three bases plus binary, it's hardly surprising (to me) that the 
documentation-writers felt no need to explicate that further than by 
pointing out (as they do) that the radix is selected in the same way 
that it is for integer constants.  (Python does not provide for integer 
constants being specified in anything other than those three mentioned 
bases.)

>>>> and there *is* a conversion step in between.  The conversion is 
>>>> especially apparent when dealing with floats, because the same 
>>>> float will display differently depending on whether you use str() 
>>>> or repr() (the interpreter uses repr() by default) --
>>>>
>>>> >>> repr(0.1)
>>>> '0.10000000000000001'
>>>
>>>
>>> Can't this be fixed?
>>
>>
>> Which, that 0.1 can't be represented in binary?  No, that can't be
>
> Try this.
>
>>>> 1./3
>>>
> 0.33333333333333331
> Why this bogus display?
> Here is how Pari solves the problem.
> ? \p 17
>   realprecision = 19 significant digits (17 digits displayed)
> ? 1./3
> 0.33333333333333333 

Sure, if you use full rational numbers, that works.  However, full 
rational numbers can't be handled by the floating-point hardware on any 
current processors, which means that the math must be emulated in 
software, which is rather slow.  But it can be done, as (apparently) 
Pari does, and indeed there's been a few packages proposed and/or 
written that do it for Python.  However, because of speed/efficiency 
issues, it's probably not going to become standard anytime soon.

>> Note that, by your logic of how these guesses should be done,
>
> My logic? Is this a true statement. Yes or no.
> Maybe any integer in the range [2, 36], or zero. If radix is zero,
> the proper radix is guessed based on the contents of string; 

That statement is true, as I said before, is true given the 
understanding that "the proper radix" is one of octal, decimal, or 
hexidecimal.  Your apparent reading of this statement (that the proper 
radix should be one higher than the highest digit in the string) is 
remarkably unpractical, because the number of cases in which one 
actually intends a string to be interpreted in base-7 or base-19 is 
vanishingly small when compared to cases where one is using those three 
major radixes.  If you *have* one of those extremely rare cases, you can 
always specify that that is what you want.  However, Python is more 
interested in being a practical language than in being a mathematically 
pure language.  Besides, the docs then go on to point out exactly how 
the proper radix is guessed -- by using the same rules that apply to 
integer literals in source code.

>> that would be the intent, since there is virtually *never* any use 
>> for numbers in a nonstandard base.
>
> Mabe not for you.
>
>>
>> When 64 bit 128 bit and higher processor chips hit the mainstream you 
>> may change your opinion
>
> This was just a hunch based on encoding the processor. With higher 
> radix, the instructions could
> be crunched in a smaller space. 

No, they couldn't, because the instructions are *stored* in binary.  You 
can display it on the screen with only one or two characters instead of 
8, but it still takes up 8 bytes of disk space.

>> The number of bits that processors can use has almost zero 
>> correlation with the usefulness of number represented in different 
>> bases.  We currently have 32-bit processors (in most cases), but that 
>> doesn't mean we're using base 32 numbers for anything.  We've used 
>> base 16 numbers since long before 16-bit processors were standard.  
>> When 64-bit processors become standard, humans will *not* learn to 
>> read base-64 numbers; we'll simply represent processor words with a 
>> longer string of hexidecimal digits.
>
> Isn't this backward evolution? Why didn't we just use longer strings 
> of octal when we went to 16 bit
> processors? Anyway here is a practical example that uses up to base 207.
>
> [...]
> # it can be changed to compress text also. Using the testpi function 
> for 1000 digits,
> # we can determine the compression ratio for various bases. # Eg., 
> base 2 = 332%,
> # base 8 =111%, base 10 =100%, base 100 = 50%, base 207 = 43.2%. 

This is mistaken, because you're only changing how many characters are 
used to display it on the screen.  No matter what base it's displayed 
in, it is still *stored* in binary, and this will not compress anything. 
 Whether you see 'FF' or '255' or '@' (or whatever other character might 
be used to represent that number in whatever base you try to use), it 
still must occupy one byte of memory / hard drive space.  Once again, 
you're confusing the display with internals.  Changing the base only 
affects display.

> [...] Also this could
> # be used as an encryption scheme for sensitive data.

I sure wouldn't trust any sensitive data to it.  If it's worth 
encrypting, it's worth using a *real* encryption scheme; this is only 
slightly more secure than rot13.

>> I say *almost* zero correlation, because the reason that hexidecimal 
>> is so popular is that a standard byte (8 bits) can be exactly 
>> represented using two hex digits.  Every possible 8-bit value can be 
>> shown in two hex digits, and every 2-hex-digit value can be shown in 
>> 8 binary digits (bits).  Humans typically find '0xE4' easier to read
>
>
> What is so unappealing to read 1024 decimal as 100 base 32 or 80 base 
> 128. Isn't there economy
> here say, from an encoding stand point. Sure type it in decimal but 
> let the converter encode it in
> base 128. how about 18446744073709551617 = 2**64+1 = 2000000001? It 
> just seems natural
> this trend will continue. 

Because usually when we represent data in anything other than decimal, 
it's because we're interested in the bit patterns.  Looking at bit 
patterns in binary is too diffuse, and it's easy to get lost.  But 
trying to understand bit patterns in anything higher than hexidecimal is 
far too dense and obtuse.  Note that even when hex and octal are used, 
programmers rarely use them for normal math; if we're interested in the 
numeric value, we're far more likely to use decimal.  But when we're 
interested in the bit pattern, and are using bitwise operators (and, or, 
xor, not) it's easier to follow what's happening if we use a 
representation that exactly matches the number of bits involved.  One 
octal digit exactly represents three bits; one hex digit exactly 
represents four bits.  Thus, hex is useful for representing bit-patterns 
on machines that use a word-size that's a multiple of four, and octal is 
useful on machines that have a word-size that's a multiple of three.

>> The point of this diversion is simply to show that unusual bases are 
>> extremely rare, and serve very little practical purpose, which is
>
> Well that means they are extremely valuable. We will see. 

Rare does not necessarily imply valuable, especially in terms of ideas. 
 These bases are rare exactly *because* there is so little use for them. 
 They're understood quite well, and make interesting theoretical models, 
but they're simply not very useful in practice.

>> *why* the Python interpreter is biased towards a few specific bases.
>
> Not really. It allows decimal up to base 36 conversion. 

The int() function will convert strings of up to base 36, yes, but the 
interpreter itself only allows integer constants to be specified in one 
of three bases, and the int() function itself will only guess a base if 
it's one of those same three bases.  That sounds like a bias to me, and 
a well-justified one too.  I strongly suspect that the main reason that 
int() will convert other bases is that it's just as easy to write it to 
be entirely general as it is to write it to be specific to those three 
bases -- in fact, checking for one of those bases would probably add 
complexity.  You'll note that in the reverse direction, converting an 
integer to a string, Python only provides a way to do so for those same 
three bases.

> Will python ever become a compiler capable of compiling itself? 

Probably not, because that's not one of the design goals, and it's not a 
direction that the Python staff is interested in going.

> Will python ever have arbitrary precision floating point built in like 
> the Pari, mathematica, Maple
> interpreters? 

Built in?  Probably not.  Available?  Definitely -- I'm pretty sure that 
third-party packages are already available.

Jeff Shannon
Technician/Programmer
Credit International