
On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:
Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).
Yes, sorry, I just meant a string (or possibly some other iterable of characters). Something like (3.x code): def encode_int(n, alphabet): if n < 0: raise ValueError("nonnegative integers only, please") base = len(alphabet) cs = [] while True: n, c = divmod(n, base) cs.append(alphabet[c]) if not n: break return ''.join(reversed(cs)) def decode_int(s, alphabet): base = len(alphabet) char_to_int = {c: i for i, c in enumerate(alphabet)} n = 0 for c in s: n = n * base + char_to_int[c] return n
alphabet = '1ilI|:' encode_int(10**10, alphabet) '|IIli|l|ili||' decode_int(_, alphabet) 10000000000
This doesn't allow negative numbers. If negative numbers should be permitted, there are some decisions to be made there too. How are they represented? With a leading '-'? If so, then '-' should not be permitted in the alphabet. Should the negative sign character be user-configurable? One problem with allowing multi-character digits in encoding is that it complicates the decoding: parsing the digit string is no longer trivial. I don't see how to make this a viable option. I'm still only +0 (now leaning towards -0, having seen how easy this is to implement, and thinking about how much possible variation there might be in what's actually needed) on adding something like this. Mark