Mailman 3 Add a builtin method to 'int' for base/radix conversion - Python-ideas

newer
IDEA: do not alter default SIGINT...

Add a builtin method to 'int' for base/radix conversion

Yuvgoog Greenle

31 Aug 2009 31 Aug '09

12:45 a.m.

I believe int(s, base) needs an inverse function to allow string representation with different bases. An example use case is 'hashing' a counter like video ID's on youtube, you could use a regular int internally and publish a shorter base-62 id for links. This subject was discussed 2.5 years ago: http://mail.python.org/pipermail/python-dev/2006-January/059789.html I opened a feature request ticket: http://bugs.python.org/issue6783 Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc) What do you guys think? RunThePun

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Carl Johnson

31 Aug 31 Aug

4:49 a.m.

How would it work for bases beyond 10? OK, so base-16 has a well known encoding, but what about base-100? Base-16 is case insensitive, but would Base-37 be case insensitive?… ISTM that the common encodings of bin, oct, and hex are already covered by built-in functions and for bases beyond that, you have decide what system to use for encoding symbols that are out of range. Do people really need trinary through septary that often? Is there a single, best-practices way of doing base-1013? Feels like YAGNI to me, but I could be wrong. — Carl Johnson

Carl Johnson

4:55 a.m.

Doing some testing with int, I found: ValueError: int() base must be >= 2 and <= 36 That resolves a lot of questions (it has to be case insensitive), but takes away ability to do the motivating use case: writing out video ids in Base-62. —Carl On Sun, Aug 30, 2009 at 6:49 PM, Carl Johnson<cmjohnson.mailinglist@gmail.com> wrote:

...

How would it work for bases beyond 10? OK, so base-16 has a well known encoding, but what about base-100? Base-16 is case insensitive, but would Base-37 be case insensitive?…

ISTM that the common encodings of bin, oct, and hex are already covered by built-in functions and for bases beyond that, you have decide what system to use for encoding symbols that are out of range. Do people really need trinary through septary that often? Is there a single, best-practices way of doing base-1013? Feels like YAGNI to me, but I could be wrong.

— Carl Johnson

Gregory P. Smith

6:53 a.m.

On Sun, Aug 30, 2009 at 9:55 PM, Carl Johnson < cmjohnson.mailinglist@gmail.com> wrote:

...

Doing some testing with int, I found:

ValueError: int() base must be >= 2 and <= 36

That resolves a lot of questions (it has to be case insensitive), but takes away ability to do the motivating use case: writing out video ids in Base-62.

If there are widely accepted and used non-controversial standard character -> digit mappings for bases greater than that I don't immediately see any reasons not to support them for ints but I do not think we should go beyond base 255. If someone wants base 256 they should use struct.pack() and beyond that I expect to find people fighting over encoding schemes. ;) Since this is the ideas list... Consider adding a digit character sequence or map passed to the conversion function defining which characters map to which digit number.

...

—Carl

On Sun, Aug 30, 2009 at 6:49 PM, Carl Johnson<cmjohnson.mailinglist@gmail.com> wrote:

...
How would it work for bases beyond 10? OK, so base-16 has a well known encoding, but what about base-100? Base-16 is case insensitive, but would Base-37 be case insensitive?…

ISTM that the common encodings of bin, oct, and hex are already covered by built-in functions and for bases beyond that, you have decide what system to use for encoding symbols that are out of range. Do people really need trinary through septary that often? Is there a single, best-practices way of doing base-1013? Feels like YAGNI to me, but I could be wrong.

— Carl Johnson

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Greg Ewing

8:21 a.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Carl Johnson wrote:

...

How would it work for bases beyond 10?

If you keep going through the alphabet you can get up to base 36. If anyone cares beyond that, it could take a string of characters to use for digits. With Unicode that would provide quite a wide range. :-) -- Greg

Masklinn

8:32 a.m.

On 31 Aug 2009, at 02:45 , Yuvgoog Greenle wrote:

...

Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc)

What do you guys think? The format mini-language already has presentation types for binary, octal, dec and hex outputs. Maybe this could be expanded in some way to get arbitrary base output (between 2 and 36 to match int())

Nick Coghlan

1 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Yuvgoog Greenle wrote:

...

I believe int(s, base) needs an inverse function to allow string representation with different bases. An example use case is 'hashing' a counter like video ID's on youtube, you could use a regular int internally and publish a shorter base-62 id for links.

This subject was discussed 2.5 years ago: http://mail.python.org/pipermail/python-dev/2006-January/059789.html

I opened a feature request ticket: http://bugs.python.org/issue6783

Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc)

What do you guys think?

This has been coming up for years and always gets bogged down in a spelling argument (a method on int, a function in the math module and an update to the str.format mini language would be the current contenders). However, most of the actual real use cases for bases between 2 and 36 were dealt with by the addition of binary and octal output to string formatting so the impetus to do anything about it is now a lot lower. As far as bases between 37 and 62 go, that would involve first getting agreement on extending int() to handle those bases by allowing case sensitive digit parsing. Presumably that would use string lexical ordering so that int('a', 37) > int('A', 37) and int('b', 37) would raise an exception. That would only be intuitive to someone that knows how ASCII based alphanumeric ordering works though. Probably not an impossible sell, but you would want a compelling use case (converting long strings of digits into shorter link URLs is a pretty good one though - I've seen that in places other than youtube). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Masklinn

1:58 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

On 31 Aug 2009, at 15:00 , Nick Coghlan wrote: Yuvgoog Greenle wrote:

...

...
I believe int(s, base) needs an inverse function to allow string representation with different bases. An example use case is 'hashing' a counter like video ID's on youtube, you could use a regular int internally and publish a shorter base-62 id for links.

This subject was discussed 2.5 years ago: http://mail.python.org/pipermail/python-dev/2006-January/059789.html

I opened a feature request ticket: http://bugs.python.org/issue6783

Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc)

What do you guys think?

This has been coming up for years and always gets bogged down in a spelling argument (a method on int, a function in the math module and an update to the str.format mini language would be the current contenders).

However, most of the actual real use cases for bases between 2 and 36 were dealt with by the addition of binary and octal output to string formatting so the impetus to do anything about it is now a lot lower.

As far as bases between 37 and 62 go, that would involve first getting agreement on extending int() to handle those bases by allowing case sensitive digit parsing. Presumably that would use string lexical ordering so that int('a', 37) > int('A', 37) and int('b', 37) would raise an exception.

That would only be intuitive to someone that knows how ASCII based alphanumeric ordering works though. Or it could be handled via a translation table (needed both ways of course) mapping n indexes to n characters (with n the base you're working with), defaulting to something sane.

Though I'm not sure this is of much interest really: even Erlang (which provides pretty good base conversion tools: it supports literal integers of any base between 2 and 36) doesn't natively support bases beyond 36. A library would probably be better for those more conflictual (or less intuitive) ranges.

MRAB

3:47 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Masklinn wrote:

...

On 31 Aug 2009, at 15:00 , Nick Coghlan wrote: Yuvgoog Greenle wrote:

...
...
I believe int(s, base) needs an inverse function to allow string representation with different bases. An example use case is 'hashing' a counter like video ID's on youtube, you could use a regular int internally and publish a shorter base-62 id for links.

This subject was discussed 2.5 years ago: http://mail.python.org/pipermail/python-dev/2006-January/059789.html

I opened a feature request ticket: http://bugs.python.org/issue6783

Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc)

What do you guys think?

This has been coming up for years and always gets bogged down in a spelling argument (a method on int, a function in the math module and an update to the str.format mini language would be the current contenders).

However, most of the actual real use cases for bases between 2 and 36 were dealt with by the addition of binary and octal output to string formatting so the impetus to do anything about it is now a lot lower.

As far as bases between 37 and 62 go, that would involve first getting agreement on extending int() to handle those bases by allowing case sensitive digit parsing. Presumably that would use string lexical ordering so that int('a', 37) > int('A', 37) and int('b', 37) would raise an exception.

That would only be intuitive to someone that knows how ASCII based alphanumeric ordering works though.

ASCII? Surely it should be Unicode! :-)

...

Or it could be handled via a translation table (needed both ways of course) mapping n indexes to n characters (with n the base you're working with), defaulting to something sane.

The default could cover only bases 2 to 36. Any base > 36 would require a user-supplied translation table.

...

Though I'm not sure this is of much interest really: even Erlang (which provides pretty good base conversion tools: it supports literal integers of any base between 2 and 36) doesn't natively support bases beyond 36. A library would probably be better for those more conflictual (or less intuitive) ranges.

It could permit a dict as the translation table when 'decoding' so that both 'A' and 'a' could be mapped to 10, if necessary.

Yuvgoog Greenle

11 Sep 11 Sep

3:16 p.m.

Does anybody have any more use cases, ideas or suggestions? I'm getting the feeling this suggestion is +0 to most people and +1 for the rest. I'm pretty new to these mailing lists so does that mean a yes or a no? On Mon, Aug 31, 2009 at 6:47 PM, MRAB <python@mrabarnett.plus.com> wrote:

...

Masklinn wrote:

...
On 31 Aug 2009, at 15:00 , Nick Coghlan wrote: Yuvgoog Greenle wrote:

...
I believe int(s, base) needs an inverse function to allow string

...
representation with different bases. An example use case is 'hashing' a counter like video ID's on youtube, you could use a regular int internally and publish a shorter base-62 id for links.

This subject was discussed 2.5 years ago: http://mail.python.org/pipermail/python-dev/2006-January/059789.html

I opened a feature request ticket: http://bugs.python.org/issue6783

Some of the questions that remain: 1. Whether this should be a method for int or a regular function in a standard library module like math. 2. What should the method/function be called? (base_convert, radix, etc)

What do you guys think?

This has been coming up for years and always gets bogged down in a spelling argument (a method on int, a function in the math module and an update to the str.format mini language would be the current contenders).

However, most of the actual real use cases for bases between 2 and 36 were dealt with by the addition of binary and octal output to string formatting so the impetus to do anything about it is now a lot lower.

As far as bases between 37 and 62 go, that would involve first getting agreement on extending int() to handle those bases by allowing case sensitive digit parsing. Presumably that would use string lexical ordering so that int('a', 37) > int('A', 37) and int('b', 37) would raise an exception.

That would only be intuitive to someone that knows how ASCII based alphanumeric ordering works though.

ASCII? Surely it should be Unicode! :-)

Or it could be handled via a translation table (needed both ways of

...
course) mapping n indexes to n characters (with n the base you're working with), defaulting to something sane.

The default could cover only bases 2 to 36. Any base > 36 would require a user-supplied translation table.

Though I'm not sure this is of much interest really: even Erlang (which

...
provides pretty good base conversion tools: it supports literal integers of any base between 2 and 36) doesn't natively support bases beyond 36. A library would probably be better for those more conflictual (or less intuitive) ranges.

It could permit a dict as the translation table when 'decoding' so that both 'A' and 'a' could be mapped to 10, if necessary.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Nick Coghlan

4:08 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Yuvgoog Greenle wrote:

...

Does anybody have any more use cases, ideas or suggestions? I'm getting the feeling this suggestion is +0 to most people and +1 for the rest. I'm pretty new to these mailing lists so does that mean a yes or a no?

A generally lukewarm response means a maybe :) A positive response on python-ideas is still a maybe until the idea has subsequently also run the gauntlet of python-dev with actual code to back it up. In this case, the status quo is: str -> int (arbitrary base up to 36) via int() constructor (base "0" meaning Python literal format). int -> str via str() (for decimal output), hex(), oct(), bin() and string formatting So the currently unsupported use cases are limited to outputting numbers in bases between 3 and 36 that are not 8, 10 or 16. You're probably going to have a hard time convincing anyone that those additional use cases are worth putting much effort into supporting (and even then, they're probably better off as a 3rd party library that can add things like support for integers in bases up to 62). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

MRAB

7:51 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Nick Coghlan wrote:

...

Yuvgoog Greenle wrote:

...
Does anybody have any more use cases, ideas or suggestions? I'm getting the feeling this suggestion is +0 to most people and +1 for the rest. I'm pretty new to these mailing lists so does that mean a yes or a no?

A generally lukewarm response means a maybe :)

A positive response on python-ideas is still a maybe until the idea has subsequently also run the gauntlet of python-dev with actual code to back it up.

In this case, the status quo is:

str -> int (arbitrary base up to 36) via int() constructor (base "0" meaning Python literal format).

int -> str via str() (for decimal output), hex(), oct(), bin() and string formatting

So the currently unsupported use cases are limited to outputting numbers in bases between 3 and 36 that are not 8, 10 or 16.

You're probably going to have a hard time convincing anyone that those additional use cases are worth putting much effort into supporting (and even then, they're probably better off as a 3rd party library that can add things like support for integers in bases up to 62).

It's one of those ideas: Q: Do you think it's a good idea? A: Yes. Q: Do you think you'd use it? A: Probably not. :-)

Mark Dickinson

13 Sep 13 Sep

6:06 p.m.

On Fri, Sep 11, 2009 at 4:16 PM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:

...

Does anybody have any more use cases, ideas or suggestions? I'm getting the feeling this suggestion is +0 to most people and +1 for the rest. I'm pretty new to these mailing lists so does that mean a yes or a no?

Just out of curiosity, I did a Google code search[*] for uses of the inverse operation: int(<some_string> ,n). I found a good handful of uses of int(s, 36), almost all apparently to do with turning integers into suitable id strings; there was also evidence that people have implemented the reverse 'integer -> base 36 string' conversion at least twice. I found no meaningful uses of any bases other than 2, 8, 10, 16, and 36. So the main use case seems to be serialization and deserialization of integers into some 'suitably nice' alphabet, and that alphabet is likely to be application-dependent. -0 for int.to_base(n) (2 <= n <= 36) or equivalent functionality in the core. +0 for a pair of library functions converting to and from base n, with explicitly given translation table. I agree with MRAB that an implicit digit set should only be allowed for 2 <= base <= 36, if at all. By the way, _PyLong_Format in Objects/longobject.c *does* contain code for general integer -> base b conversions, 2 <= b <= 36, but that code is currently unused (as far as I can tell). Mark [*] http://www.google.com/codesearch?hl=en&lr=&q=%5CWint%5Cs*%5C%28.*%5C%2C%5Cs*36%5Cs*%5C%29+lang%3Apython&sbtn=Search

Yuvgoog Greenle

14 Sep 14 Sep

2:51 a.m.

Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?). --yuv On Sun, Sep 13, 2009 at 9:06 PM, Mark Dickinson <dickinsm@gmail.com> wrote:

...

...
Does anybody have any more use cases, ideas or suggestions? I'm getting

...
feeling this suggestion is +0 to most people and +1 for the rest. I'm

On Fri, Sep 11, 2009 at 4:16 PM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote: the pretty

...
new to these mailing lists so does that mean a yes or a no?

Just out of curiosity, I did a Google code search[*] for uses of the inverse operation: int(<some_string> ,n). I found a good handful of uses of int(s, 36), almost all apparently to do with turning integers into suitable id strings; there was also evidence that people have implemented the reverse 'integer -> base 36 string' conversion at least twice. I found no meaningful uses of any bases other than 2, 8, 10, 16, and 36. So the main use case seems to be serialization and deserialization of integers into some 'suitably nice' alphabet, and that alphabet is likely to be application-dependent.

-0 for int.to_base(n) (2 <= n <= 36) or equivalent functionality in the core.

+0 for a pair of library functions converting to and from base n, with explicitly given translation table. I agree with MRAB that an implicit digit set should only be allowed for 2 <= base <= 36, if at all.

By the way, _PyLong_Format in Objects/longobject.c *does* contain code for general integer -> base b conversions, 2 <= b <= 36, but that code is currently unused (as far as I can tell).

Mark

[*] http://www.google.com/codesearch?hl=en&lr=&q=%5CWint%5Cs*%5C%28.*%5C%2C%5Cs*36%5Cs*%5C%29+lang%3Apython&sbtn=Search

Lie Ryan

12:36 p.m.

Yuvgoog Greenle wrote:

...

Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).

If the translation table is limited to a string, the function would be very limited. For example, it might be useful to use base-change function to convert between an IPv4 address and integer. The common representation of IPv4 address uses base-255 "number" (0.0.0.0-255.255.255.255)

MRAB

1:37 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Lie Ryan wrote:

...

Yuvgoog Greenle wrote:

...
Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).

If the translation table is limited to a string, the function would be very limited. For example, it might be useful to use base-change function to convert between an IPv4 address and integer. The common representation of IPv4 address uses base-255 "number" (0.0.0.0-255.255.255.255)

If the translation table is a dict then you can decode both 'A' and 'a' to 10. You could, of course, permit either a string or a dict, with a string being converted to a dict.

Mark Dickinson

15 Sep 15 Sep

9:38 a.m.

On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:

...

Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).

Yes, sorry, I just meant a string (or possibly some other iterable of characters). Something like (3.x code): def encode_int(n, alphabet): if n < 0: raise ValueError("nonnegative integers only, please") base = len(alphabet) cs = [] while True: n, c = divmod(n, base) cs.append(alphabet[c]) if not n: break return ''.join(reversed(cs)) def decode_int(s, alphabet): base = len(alphabet) char_to_int = {c: i for i, c in enumerate(alphabet)} n = 0 for c in s: n = n * base + char_to_int[c] return n

...

...
...
alphabet = '1ilI|:' encode_int(10**10, alphabet) '|IIli|l|ili||' decode_int(_, alphabet) 10000000000

This doesn't allow negative numbers. If negative numbers should be permitted, there are some decisions to be made there too. How are they represented? With a leading '-'? If so, then '-' should not be permitted in the alphabet. Should the negative sign character be user-configurable? One problem with allowing multi-character digits in encoding is that it complicates the decoding: parsing the digit string is no longer trivial. I don't see how to make this a viable option. I'm still only +0 (now leaning towards -0, having seen how easy this is to implement, and thinking about how much possible variation there might be in what's actually needed) on adding something like this. Mark

MRAB

4:48 p.m.

New subject: Add a builtin method to 'int' for base/radix conversion

Mark Dickinson wrote:

...

On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:

...
Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).

Yes, sorry, I just meant a string (or possibly some other iterable of characters). Something like (3.x code):

def encode_int(n, alphabet): if n < 0: raise ValueError("nonnegative integers only, please") base = len(alphabet) cs = [] while True: n, c = divmod(n, base) cs.append(alphabet[c]) if not n: break return ''.join(reversed(cs))

def decode_int(s, alphabet): base = len(alphabet) char_to_int = {c: i for i, c in enumerate(alphabet)} n = 0 for c in s: n = n * base + char_to_int[c] return n

...
...
...
alphabet = '1ilI|:' encode_int(10**10, alphabet) '|IIli|l|ili||' decode_int(_, alphabet) 10000000000

This doesn't allow negative numbers. If negative numbers should be permitted, there are some decisions to be made there too. How are they represented? With a leading '-'? If so, then '-' should not be permitted in the alphabet. Should the negative sign character be user-configurable?

One problem with allowing multi-character digits in encoding is that it complicates the decoding: parsing the digit string is no longer trivial. I don't see how to make this a viable option.

I'm still only +0 (now leaning towards -0, having seen how easy this is to implement, and thinking about how much possible variation there might be in what's actually needed) on adding something like this.

I'd prefer the arguments to be: value, base, optional translation table. The translation table would default to 0-9, A-Z/a-z (when decoding, multiple characters could map to the same numeric value, eg 'A' => 10 and 'a' => 10, hence the ability to use a dict). The default translation table would work up to base 36; higher bases would raise a ValueError exception "translation table too small for base". Could a single translation table work both ways? A dict for decoding could contain {'A': 10, 'a': 10}, but how could you reverse that for encoding?

Yuvgoog Greenle

16 Sep 16 Sep

2:39 a.m.

also, negative numbers aren't taboo. FTFY import string def mirror_dict(dict): # list because iteration doesn't support changing of the dict for key, val in list(dict.items()): dict[val] = key _lows = string.digits + string.ascii_lowercase default_alphabet = dict(zip(range(len(_lows)), _lows)) mirror_dict(default_alphabet) def encode_int(value, base, alphabet=default_alphabet): if value < 0: is_negative = True value = value*(-1) else: is_negative = False cs = [] while True: value, index = divmod(value, base) cs.insert(0, alphabet[index]) if not value: break if is_negative: cs.insert(0, '-') return ''.join(cs) def decode_int(str_num, base, alphabet=default_alphabet): char_to_int = {c: i for i, c in enumerate(alphabet)} value = 0 for c in str_num: value = value * base + char_to_int[c] return value alphabet = '1ilI|:' enc = encode_int(10**10, len(alphabet), alphabet) print('|IIli|l|ili||') print(enc) dec = decode_int(enc, len(alphabet), alphabet) print(dec) print(10000000000) print(encode_int(12345, 32)) print(encode_int(-12345, 32)) ----- The only problem with {'A': 10, 'a': 10} is that it's not reversible. If we wantted to encode, 10, what should be used, A or a? On Tue, Sep 15, 2009 at 7:48 PM, MRAB <python@mrabarnett.plus.com> wrote:

...

Mark Dickinson wrote:

...
On Mon, Sep 14, 2009 at 3:51 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:

...
Btw, when you say translation table, do you mean just a string? Because a translation table would need to be continuous from 0 to the base so a real dicitionary-esque table may be overkill. The only advantage of a table might be to convert certain digits into multiple bytes (some sort of ad-hoc unicode use case?).

Yes, sorry, I just meant a string (or possibly some other iterable of characters). Something like (3.x code):

def encode_int(n, alphabet): if n < 0: raise ValueError("nonnegative integers only, please") base = len(alphabet) cs = [] while True: n, c = divmod(n, base) cs.append(alphabet[c]) if not n: break return ''.join(reversed(cs))

def decode_int(s, alphabet): base = len(alphabet) char_to_int = {c: i for i, c in enumerate(alphabet)} n = 0 for c in s: n = n * base + char_to_int[c] return n

alphabet = '1ilI|:'

...
...
...
encode_int(10**10, alphabet)

'|IIli|l|ili||'

...
decode_int(_, alphabet)

...
...
10000000000

This doesn't allow negative numbers. If negative numbers should be permitted, there are some decisions to be made there too. How are they represented? With a leading '-'? If so, then '-' should not be permitted in the alphabet. Should the negative sign character be user-configurable?

One problem with allowing multi-character digits in encoding is that it complicates the decoding: parsing the digit string is no longer trivial. I don't see how to make this a viable option.

I'm still only +0 (now leaning towards -0, having seen how easy this is to implement, and thinking about how much possible variation there might be in what's actually needed) on adding something like this.

I'd prefer the arguments to be: value, base, optional translation table. The translation table would default to 0-9, A-Z/a-z (when decoding, multiple characters could map to the same numeric value, eg 'A' => 10 and 'a' => 10, hence the ability to use a dict). The default translation table would work up to base 36; higher bases would raise a ValueError exception "translation table too small for base".

Could a single translation table work both ways? A dict for decoding could contain {'A': 10, 'a': 10}, but how could you reverse that for encoding?

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Arnaud Delobelle

7:34 a.m.

2009/9/16 Yuvgoog Greenle <ubershmekel@gmail.com>: [...]

...

The only problem with {'A': 10, 'a': 10} is that it's not reversible. If we wantted to encode, 10, what should be used, A or a?

Obviously, either :) -- Arnaud

Yuvgoog Greenle

17 Sep 17 Sep

1:12 a.m.

Excuse my 'obviously', let me clarify. The following alphabet is reversable: for encode: {0 : 'a', 1, 'b'} for encode/decode: {0: 'a', 1: 'b', 'a':0, 'b':1} You can have 'a' and 'A' coexist, but the encode will have only one option. for encode: :{0 : 'a', 1, 'b'} for encode/decode: {0: 'a', 1: 'b', 'a': 0, 'b': 1, 'A': 0, 'B': 1} So one could say the encoding alphabet is the canonical representation. The encoding alphabet is actually alphabet_dict[0],..., alphabet_dict[base - 1]. On Wed, Sep 16, 2009 at 10:34 AM, Arnaud Delobelle <arnodel@googlemail.com>wrote:

...

2009/9/16 Yuvgoog Greenle <ubershmekel@gmail.com>: [...]

...
The only problem with {'A': 10, 'a': 10} is that it's not reversible. If we wantted to encode, 10, what should be used, A or a?

Obviously, either :)

-- Arnaud

5583

Age (days ago)

5600

Last active (days ago)

List overview

Download

20 comments

10 participants

participants (10)

Arnaud Delobelle
Carl Johnson
Greg Ewing
Gregory P. Smith
Lie Ryan
Mark Dickinson
Masklinn
MRAB
Nick Coghlan
Yuvgoog Greenle

Add a builtin method to 'int' for base/radix conversion

Carl Johnson

Carl Johnson

Arnaud Delobelle

tags

participants (10)