[Python-ideas] Add a builtin method to 'int' for base/radix conversion
MRAB
python at mrabarnett.plus.com
Tue Sep 15 18:48:11 CEST 2009
Mark Dickinson wrote:
> On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel at gmail.com> wrote:
>> Btw, when you say translation table, do you mean just a string? Because a
>> translation table would need to be continuous from 0 to the base so a real
>> dicitionary-esque table may be overkill. The only advantage of a table might
>> be to convert certain digits into multiple bytes (some sort of ad-hoc
>> unicode use case?).
>
> Yes, sorry, I just meant a string (or possibly some other iterable of
> characters).
> Something like (3.x code):
>
> def encode_int(n, alphabet):
> if n < 0:
> raise ValueError("nonnegative integers only, please")
> base = len(alphabet)
> cs = []
> while True:
> n, c = divmod(n, base)
> cs.append(alphabet[c])
> if not n:
> break
> return ''.join(reversed(cs))
>
> def decode_int(s, alphabet):
> base = len(alphabet)
> char_to_int = {c: i for i, c in enumerate(alphabet)}
> n = 0
> for c in s:
> n = n * base + char_to_int[c]
> return n
>
>>>> alphabet = '1ilI|:'
>>>> encode_int(10**10, alphabet)
> '|IIli|l|ili||'
>>>> decode_int(_, alphabet)
> 10000000000
>
> This doesn't allow negative numbers. If negative numbers should be
> permitted, there are some decisions to be made there too. How are
> they represented? With a leading '-'? If so, then '-' should not be
> permitted in the alphabet. Should the negative sign character be
> user-configurable?
>
> One problem with allowing multi-character digits in encoding is that it
> complicates the decoding: parsing the digit string is no longer trivial.
> I don't see how to make this a viable option.
>
> I'm still only +0 (now leaning towards -0, having seen how easy this
> is to implement, and thinking about how much possible variation
> there might be in what's actually needed) on adding something like this.
>
I'd prefer the arguments to be: value, base, optional translation table.
The translation table would default to 0-9, A-Z/a-z (when decoding,
multiple characters could map to the same numeric value, eg 'A' => 10
and 'a' => 10, hence the ability to use a dict). The default translation
table would work up to base 36; higher bases would raise a ValueError
exception "translation table too small for base".
Could a single translation table work both ways? A dict for decoding
could contain {'A': 10, 'a': 10}, but how could you reverse that for
encoding?
More information about the Python-ideas
mailing list