[Tutor] built in functions int(),long()+convert.base(r1,r2,num)

Jeff Shannon jeff@ccvcorp.com
Mon Jun 23 17:33:01 2003


cino hilliard wrote:

>> You cannot convert to any other base than 2 (a series of charges in a 
>> set of transistors), but every time that Python shows you the number,
>
>> it'll automatically convert it to base 10 for you, because
>
> How can this be? You just said "You cannot convert to any other base 
> than 2
> (a series of charges in a set of transistors) 


Right.  The computer stores integers in base 2, by creating a pattern of 
charges.  When it comes time to display that integer on the screen, 
however, the display functions convert that pattern of charges into a 
string of characters representing that number in base 10.

>> But all of this is beside the point, because, as I said before, an 
>> integer doesn't care what base you think it is[...] 
>

> This is quite vague. If you type
>
>>>> print 12345
>>>
> 12345
> you  get a base 10 number. The print command gives output in decimal 
> or base 10.
>
>>>> print 0xFFFF
>>>
> 65535
> Even using hex notation print still outputs base 10. 


Yes, because in both cases, the interpreter converts the number you've 
typed into an internal integer (which it stores, however briefly, in 
binary format), and then sends that integer to the display routines, 
which automatically convert it to a string in base 10 for display purposes.

> You are conflating the inner-workings of python and the output as the 
> same unified thing.  Again the
> key word is output not the  electrical capacitance of transistors. 
> BTW, I don't think a transistor is a capacitor but rather a 
> simi-conducter or amplier and switch. A  Dynamic Ram memory cell has a 
> transistor
> AND capacitor. The capacitor holds the charge bit=1 and the transistor 
> is controlled by the memory
> circutry the release of the charge. I picked this up with a google 
> search. The processor probably has both also. 


The key is that the output is a separate thing than the storage of (and 
the existence of) the number.  I may have used the wrong terms for the 
various electronic components -- I'm not an electrical engineer, nor do 
I desire to become one, and my understanding of the electronics involved 
is very abstract.  The point is that the internal representation of a 
number is different than the string that's shown on the screen when the 
number is displayed.  Yet they both represent the same number, even 
though each representation uses a different base.

> Maybe any integer in the range [2, 36], or zero. If radix is zero,
> the proper radix is guessed based on the contents of string;
> Should the parser not guess that the radix is 29 or higher for 
> int('JEFFSHANNON',0)? 


It guesses by the same rules that it uses to parse numeric literals in 
any code, as the docs for int() describe.  If you were to start the 
interpreter and type JEFFSHANNON, what would you expect?  Since there 
are no numeric characters in that, and it doesn't use any of the special 
indications for octal or hex numbers, this is an error.  The parser will 
decide that it must be an identifier, and will try to resolve it, giving 
a NameError when it finds nothing with that name.  Since int() knows 
that it's supposed to be a number, but doesn't see any indications that 
it *is* a number, it gives an error.  It doesn't try to guess that the 
radix is the lowest number that can represent the highest-ordinal 
character in the string, because the odds that someone really wants that 
are insignificant.  There are *very* few uses for numbers in an 
arbitrary radix; 99.9% of the time, a displayed number should be  
represented in binary, octal, decimal, or hexidecimal.  Of the tiny 
percentage of times that someone actually wants something in some other 
radix, almost all of those occurrences are simply examples showing how 
different numeric bases work.  I have never seen, and can't imagine, a 
project for which it's truly useful and practical to represent something 
in base 29.

> Internal manipulation is a separate issue from display,
> I am not questioning the internals. It is the display or output I am 
> interested in. 


So why are you having such a difficult time with the concept that the 
fact that you're shown a base 10 number is just an attribute of the 
display?  

>> and there *is* a conversion step in between.  The conversion is 
>> especially apparent when dealing with floats, because the same float 
>> will display differently depending on whether you use str() or repr() 
>> (the interpreter uses repr() by default) --
>>
>> >>> repr(0.1)
>> '0.10000000000000001'
>
> Can't this be fixed? 


Which, that 0.1 can't be represented in binary?  No, that can't be fixed 
-- it's a well-known limitation of binary floating point numbers, and is 
inherent in the mathematics.  The only "fix" would be to completely 
redesign every bit of computer floating-point code and hardware in use, 
following entirely different principles (true decimal floating point, or 
full rational numbers, instead of binary floating point).  And even 
then, it would only partially solve the problem, because some numbers 
simply cannot be represented with a finite (and very limited) number of 
digits, regardless of the base that's used to represent them.

>
>> >>>
>
> Then why this?
>
>>>> print 0.1
>>>
> 0.1
> if (the interpreter uses repr() by default) 


This is because 'print' uses str() by default.

>> FFFF = "15"
>> base3.convert(16, 2, FFFF)
>>
>> Now, is this intended to convert 15 to binary (11111111), or FFFF to 
>> binary (11111111 11111111 11111111 11111111) ??  There's no way to 
>> tell, and Python certainly shouldn't be trying to guess.
>
> Oh no? The Book says
>
>> Maybe any integer in the range [2, 36], or zero. If radix is zero, 
>> the proper radix is guessed based on the contents of string;
>
> However, it doesen't work. 


That's the case once a string has already been passed into the function. 
 The important point is that Python must be able to tell whether it's a 
string or a numeric literal *before* it's passed into the function, and 
before it has any knowledge of *what* function this parameter is 
intended for.  And anyhow, the way that Python "guesses" the proper 
radix is a much simpler and more straightforward procedure than what 
you're imagining.  Such a "guess" will result in one of three possible 
radixes (radices?) -- hexidecimal if the string starts with '0x', octal 
if the string starts with '0' and the second character is *not* 'x', or 
decimal otherwise.  Whichever of these three options it guesses, it will 
give an error (invalid literal) if there are characters that are not 
appropriate for digits in that particular base.

Note that, by your logic of how these guesses should be done, 
int('21',0) would be interpreted as being in trinary (base 3), and would 
be equal to 7 decimal.  It seems *extremely* unlikely that that would be 
the intent, since there is virtually *never* any use for numbers in a 
nonstandard base.

>>>>>> convert.base(10,16,2**256-1)                 See Mom, No quotes 
>>>>>> here either!
>>>>>
>>>>>
>>> 0FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
>>
>>
>> That's because you're using an integer constant, which the parser 
>> *can* distinguish from an identifier -- indeed, being able to 
>> distinguish between those is why identifiers cannot start with a 
>> numeric character.
>
> some more no quotes.
>
>>>> convert.base(10,16,e**e*tan(1)*4)
>>>
> 888E53DBD2D not correct value but it parsed!
>
>>>> convert.base(10,16,convert.numToBase(100,16))
>>>
> 40 


That's because, in each of these cases, you're using an expression which 
evaluates to an integer.  The parser separates 'e**e*tan(1)*4' (BTW, 
maybe that's the "wrong" value because operator precedence isn't what 
you're expecting/intending?) and 'convert.numToBase(100,16)', and 
evaluates each of those subexpressions on it's own.  In both cases, 
those subexpressions evaluate to an integer.

As another point, the first one only works if you've imported e from 
math, and then it *is* interpeting e as an identifier.

 >>> e
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
NameError: name 'e' is not defined
 >>> from math import e
 >>> e
2.7182818284590451
 >>>

This is precisely why quotes are necessary -- to separate the 
mathematical constant e from the hexidecimal digit e from the base-36 
digit e.

Indeed, the same applies in your second example -- the lack of quotes is 
how Python knows that 'convert.numToBase' represents a function to be 
called instead of a numeric constant in some strange radix (that nobody 
ever actually uses).

[consolidating a bit, from your other email]

> probably not of any practical value.


When 64 bit 128 bit and higher processor chips hit the mainstream you 
may change your opinion
if you want to get a base 64 representation. My convert.base would do 
this quickly if we used
the { and | as the value for 62 and 62 base 10.
-----------------

The number of bits that processors can use has almost zero correlation 
with the usefulness of number represented in different bases.  We 
currently have 32-bit processors (in most cases), but that doesn't mean 
we're using base 32 numbers for anything.  We've used base 16 numbers 
since long before 16-bit processors were standard.  When 64-bit 
processors become standard, humans will *not* learn to read base-64 
numbers; we'll simply represent processor words with a longer string of 
hexidecimal digits.

I say *almost* zero correlation, because the reason that hexidecimal is 
so popular is that a standard byte (8 bits) can be exactly represented 
using two hex digits.  Every possible 8-bit value can be shown in two 
hex digits, and every 2-hex-digit value can be shown in 8 binary digits 
(bits).  Humans typically find '0xE4' easier to read than '11100100', so 
hex makes a convenient shorthand for looking at bit patterns.  Note that 
this means that 32 bits are equivalent to 8 hex digits, and 64 bits to 
16 hex digits.  Once upon a time, many mainframes/minicomputers used 
9-bit, or 18-bit, or 27-bit words.  9 bits have that same mapping to 
three octal digits, so for these machines, octal is the convenient 
shorthand.  As that type of machine passes out of favor, octal is 
passing out of favor now, too.

The point of this diversion is simply to show that unusual bases are 
extremely rare, and serve very little practical purpose, which is *why* 
the Python interpreter is biased towards a few specific bases.

Jeff Shannon
Technician/Programmer
Credit International