
Is this an over-sight, or by design?
It appears easy to support Unicode - there is already an explicit StringType check in these functions, and it simply delegates to int(), which already _does_ work for Unicode A patch would leave the following behaviour:
IMO, this is better than what we have now. I'll put together a patch if one is wanted... Mark.

Mark Hammond wrote:
Probably an oversight... and it may well not be the only one: there are many explicit string checks in the code which might need to be fixed for Unicode support. As for string.ato? I'm not sure: these functions are obsoleted by int(), float() and long().
Right. I fixed the above three APIs to support Unicode.
BTW, the code in string.py for atoi() et al. looks really complicated: """ def atoi(*args): """atoi(s [,base]) -> int Return the integer represented by the string s in the given base, which defaults to 10. The string s must consist of one or more digits, possibly preceded by a sign. If base is 0, it is chosen from the leading characters of s, 0 for octal, 0x or 0X for hexadecimal. If base is 16, a preceding 0x or 0X is accepted. """ try: s = args[0] except IndexError: raise TypeError('function requires at least 1 argument: %d given' % len(args)) # Don't catch type error resulting from too many arguments to int(). The # error message isn't compatible but the error type is, and this function # is complicated enough already. if type(s) == _StringType: return _apply(_int, args) else: raise TypeError('argument 1: expected string, %s found' % type(s).__name__) """ Why not simply... def atoi(s, base=10): return int(s, base) dito for atol() and atof()... ?! This would not only give us better performance, but also Unicode support for free. (I'll fix int() and long() to accept Unicode when using an explicit base too.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Mark Hammond wrote:
Probably an oversight... and it may well not be the only one: there are many explicit string checks in the code which might need to be fixed for Unicode support. As for string.ato? I'm not sure: these functions are obsoleted by int(), float() and long().
Right. I fixed the above three APIs to support Unicode.
BTW, the code in string.py for atoi() et al. looks really complicated: """ def atoi(*args): """atoi(s [,base]) -> int Return the integer represented by the string s in the given base, which defaults to 10. The string s must consist of one or more digits, possibly preceded by a sign. If base is 0, it is chosen from the leading characters of s, 0 for octal, 0x or 0X for hexadecimal. If base is 16, a preceding 0x or 0X is accepted. """ try: s = args[0] except IndexError: raise TypeError('function requires at least 1 argument: %d given' % len(args)) # Don't catch type error resulting from too many arguments to int(). The # error message isn't compatible but the error type is, and this function # is complicated enough already. if type(s) == _StringType: return _apply(_int, args) else: raise TypeError('argument 1: expected string, %s found' % type(s).__name__) """ Why not simply... def atoi(s, base=10): return int(s, base) dito for atol() and atof()... ?! This would not only give us better performance, but also Unicode support for free. (I'll fix int() and long() to accept Unicode when using an explicit base too.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (2)
-
M.-A. Lemburg
-
Mark Hammond