[Python-ideas] Add a builtin method to 'int' for base/radix conversion

Mark Dickinson dickinsm at gmail.com
Tue Sep 15 11:38:13 CEST 2009


On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel at gmail.com> wrote:
> Btw, when you say translation table, do you mean just a string? Because a
> translation table would need to be continuous from 0 to the base so a real
> dicitionary-esque table may be overkill. The only advantage of a table might
> be to convert certain digits into multiple bytes (some sort of ad-hoc
> unicode use case?).

Yes, sorry, I just meant a string (or possibly some other iterable of
characters).
Something like (3.x code):

def encode_int(n, alphabet):
    if n < 0:
        raise ValueError("nonnegative integers only, please")
    base = len(alphabet)
    cs = []
    while True:
        n, c = divmod(n, base)
        cs.append(alphabet[c])
        if not n:
            break
    return ''.join(reversed(cs))

def decode_int(s, alphabet):
    base = len(alphabet)
    char_to_int = {c: i for i, c in enumerate(alphabet)}
    n = 0
    for c in s:
        n = n * base + char_to_int[c]
    return n

>>> alphabet = '1ilI|:'
>>> encode_int(10**10, alphabet)
'|IIli|l|ili||'
>>> decode_int(_, alphabet)
10000000000

This doesn't allow negative numbers.  If negative numbers should be
permitted, there are some decisions to be made there too.  How are
they represented?  With a leading '-'?  If so, then '-' should not be
permitted in the alphabet.  Should the negative sign character be
user-configurable?

One problem with allowing multi-character digits in encoding is that it
complicates the decoding:  parsing the digit string is no longer trivial.
I don't see how to make this a viable option.

I'm still only +0 (now leaning towards -0, having seen how easy this
is to implement, and thinking about how much possible variation
there might be in what's actually needed) on adding something like this.

Mark



More information about the Python-ideas mailing list