[Python-ideas] Add a builtin method to 'int' for base/radix conversion

Tue Sep 15 18:48:11 CEST 2009

Mark Dickinson wrote:
> On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel at gmail.com> wrote:
>> Btw, when you say translation table, do you mean just a string? Because a
>> translation table would need to be continuous from 0 to the base so a real
>> dicitionary-esque table may be overkill. The only advantage of a table might
>> be to convert certain digits into multiple bytes (some sort of ad-hoc
>> unicode use case?).
> 
> Yes, sorry, I just meant a string (or possibly some other iterable of
> characters).
> Something like (3.x code):
> 
> def encode_int(n, alphabet):
>     if n < 0:
>         raise ValueError("nonnegative integers only, please")
>     base = len(alphabet)
>     cs = []
>     while True:
>         n, c = divmod(n, base)
>         cs.append(alphabet[c])
>         if not n:
>             break
>     return ''.join(reversed(cs))
> 
> def decode_int(s, alphabet):
>     base = len(alphabet)
>     char_to_int = {c: i for i, c in enumerate(alphabet)}
>     n = 0
>     for c in s:
>         n = n * base + char_to_int[c]
>     return n
> 
>>>> alphabet = '1ilI|:'
>>>> encode_int(10**10, alphabet)
> '|IIli|l|ili||'
>>>> decode_int(_, alphabet)
> 10000000000
> 
> This doesn't allow negative numbers.  If negative numbers should be
> permitted, there are some decisions to be made there too.  How are
> they represented?  With a leading '-'?  If so, then '-' should not be
> permitted in the alphabet.  Should the negative sign character be
> user-configurable?
> 
> One problem with allowing multi-character digits in encoding is that it
> complicates the decoding:  parsing the digit string is no longer trivial.
> I don't see how to make this a viable option.
> 
> I'm still only +0 (now leaning towards -0, having seen how easy this
> is to implement, and thinking about how much possible variation
> there might be in what's actually needed) on adding something like this.
> 
I'd prefer the arguments to be: value, base, optional translation table.
The translation table would default to 0-9, A-Z/a-z (when decoding,
multiple characters could map to the same numeric value, eg 'A' => 10
and 'a' => 10, hence the ability to use a dict). The default translation
table would work up to base 36; higher bases would raise a ValueError
exception "translation table too small for base".

Could a single translation table work both ways? A dict for decoding
could contain {'A': 10, 'a': 10}, but how could you reverse that for
encoding?