Marc-Andre Lemburg wrote:
Since these are numbers, it would be convenient if there were some way to create them in form of literals, much like 123L creates longs instead of integers or u"abc" gives you Unicode instead of an 8-bit string.
I was wondering whether it would be worth adding something like a registry of literal modifiers to Python, so that extensions can register new modifiers with the compiler, e.g.
sitecustomize.py: def create_I_literal(literal_string): return 'mx.Number.Integer(%s)' % literal_string sys.register_numberlitmod('I', create_I_literal)
test.py: x = 123I * 456I print x, 234I
Too limiting. You'd only be able to do this for numbers, and it doesn't seem worth the pain just for numbers. Better would be user-definable *prefixes*. Common Lisp, for instance, makes it easy to customize the reader to recognize tokens of the form <hash> <character> <anything>. So you can arrange that #Q123,234,456:a(b)c turns into, erm, something terribly useful :-). Some of these characters are already taken for things like arrays [#(1 2 3), #2((1 2) (3 4))], "logical pathnames" (lightly abstracted filenames) [#"foo/bar/baz"], bit vectors [#*0001101011001], and so on. As perceptive readers will have noticed, you can splice a number between "#" and the magic character for special effects. Python could do something similar, though obviously "#" isn't a suitable character :-). Letting the user hijack the reader as completely as can be done in CL would probably be un-Pythonic, too. Here's a strawman suggestion. For any character "x" in some set I can't be bothered to specify, the Python tokenizer/parser will subject input of the form $x<string-literal> to special processing. The string-literal can be formed using any of {',",''',"""}. When I say "tokenizer/parser", I mean: the tokenizer will produce a special token encoding the character "x" and the contents of the string-literal. The parser will perform "special processing" in an attempt to turn it into a more normal token. The default "special processing" is to raise a SyntaxError. The user can define the special processing appropriate for a particular character "x" by making a function that interprets the string and feeding it to sys.register_dollar_handler. (In fact, anything callable will do.) The function will be passed two arguments: the character "x" and the string. Its return value will replace the $x"..." combination in the token stream, as a literal token. If an exception other than a SyntaxError is raised and not caught in the handler function then it will be silently replaced by a SyntaxError whose parameter has the form "ill-formed <xxx> literal". The value of "xxx" is defined when registering the handler. Handler functions are permitted to call "eval". Example: >>> def handle_rational(char, s): ... assert char == 'r' ... components = s.split('/') ... numerator, denominator = map(int, components) ... return Rational(numerator, denominator) ... >>> sys.register_dollar_handler('r', handle_rational, 'rational') >>> print $r"1/2" + $r"3/4" $r"5/4" >>> print $r"12345" File "<string>", line 1 print $r"12345" ^ SyntaxError: ill-formed rational literal >>> Alternatively: >>> class Rational: ... def __init__(self, x, y): ... if isinstance(x, str): ... x,y = map(int, y.split("/")) ... self._numerator, self._denominator = x,y ... [etc] ... >>> sys.register_dollar_handler('r', Rational, 'rational') Some dollar-syntax characters may be handled by Python itself or the standard library, or may be reserved for their use. It is possible for users to override them, but this should be considered bad practice. Registering a handler when one is already in place will produce a warning. To un-register a handler, pass None instead of the handler function. Possible applications: - Rational numbers. $r"123/234" - Regular expressions. $/"foo.*bar" - Dates and times. $t"2002-09-27 11:38" - Hostnames and ports. $h"www.google.com:80" Questions: - Is this insane? - Is "$" the best character? - Should there be a way to return tokens other than literal ones? For instance, identifiers or keywords? - Is the behaviour with exceptions correct? -- g