Mark Dickinson wrote:
On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:
Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).
Yes, sorry, I just meant a string (or possibly some other iterable of characters). Something like (3.x code):
def encode_int(n, alphabet): if n < 0: raise ValueError("nonnegative integers only, please") base = len(alphabet) cs = [] while True: n, c = divmod(n, base) cs.append(alphabet[c]) if not n: break return ''.join(reversed(cs))
def decode_int(s, alphabet): base = len(alphabet) char_to_int = {c: i for i, c in enumerate(alphabet)} n = 0 for c in s: n = n * base + char_to_int[c] return n
alphabet = '1ilI|:' encode_int(10**10, alphabet) '|IIli|l|ili||' decode_int(_, alphabet) 10000000000
This doesn't allow negative numbers. If negative numbers should be permitted, there are some decisions to be made there too. How are they represented? With a leading '-'? If so, then '-' should not be permitted in the alphabet. Should the negative sign character be user-configurable?
One problem with allowing multi-character digits in encoding is that it complicates the decoding: parsing the digit string is no longer trivial. I don't see how to make this a viable option.
I'm still only +0 (now leaning towards -0, having seen how easy this is to implement, and thinking about how much possible variation there might be in what's actually needed) on adding something like this.
I'd prefer the arguments to be: value, base, optional translation table. The translation table would default to 0-9, A-Z/a-z (when decoding, multiple characters could map to the same numeric value, eg 'A' => 10 and 'a' => 10, hence the ability to use a dict). The default translation table would work up to base 36; higher bases would raise a ValueError exception "translation table too small for base". Could a single translation table work both ways? A dict for decoding could contain {'A': 10, 'a': 10}, but how could you reverse that for encoding?