user-defined operators: a very modest proposal

Tue Nov 22 23:52:10 CET 2005

If your proposal is implemented, what does this code mean?
	if [1,2]+[3,4] != [1,2,3,4]: raise TestFailed, 'list concatenation'
Since it contains ']+[' I assume it must now be parsed as a user-defined
operator, but this code currently has a meaning in Python.

(This code is the first example I found, Python 2.3's test/, so it
is actual code)

I don't believe that Python needs user-defined operators, but let me share my
terrible proposal anyway:  Each unicode character in the class 'Sm' (Symbol,
Math) whose value is greater than 127 may be used as a user-defined operator.
The special method called depends on the ord() of the unicode character, so
that __u2044__ is called when the source code contains u'\N{FRACTION SLASH}'.
Whatever alternate syntax is adopted to allow unicode identifier characters to
be typed in pure ASCII will also apply to typing user-defined operators.  "r"
and "i" versions of the operators will of course exist, as in __ru2044__ and

Also, to accomodate operators such as u'\N{DOUBLE INTEGRAL}', which are not
simple unary or binary operators, the character u'\N{NO BREAK SPACE}' will be
used to separate arguments.  When necessary, parentheses will be added to
remove ambiguity.  This leads naturally to expressions like
(corresponding to the call (y*x**2).__u222c__(dx, dy)) which are clearly easy
to love, except for the small issue that many inferior editors will not clearly
display the \N{NO BREAK SPACE} characters.

Some items on which I think I'd like to hear the community's ideas are:
    * Do we give special meaning to comparison characters like
      \N{NEITHER LESS-THAN NOR GREATER-THAN}, or let users define them in new
      ways?  We could just provide, on object,
	def __u2279__(self, other): return not self.__gt__(other) and other.__gt__(self)
      which would in effect satisfy all users.

    * Do we immediately implement the combination of operators with nonspacing
      marks, or defer it?  If we implement it, do we allow the combination with
      pure ASCII operators, as in 
      or treat it as a syntax error?  (BTW the method name for this would be
      __u20e1u002b__, even though it might be tempting to support __u20e1x2b__,
      __u2oe1add__ and similar method names)  How and when do we normalize
      operators combined with more than one nonspacing mark?

    * Which unicode operator methods should be supported by built-in types?
      Implementing __u222a__ and __iu222a__ for sets is a no-brainer,
      obviously, but what about __iu2206__ for integers and long?

    * Should some of the unicode mathematical symbols be reserved for literals?
      It would be greatly preferable to write \u2205 instead of the other proposed
      empty-set literal notation, {-}.  Perhaps nullary operators could be defined,
      so that writing \u2205 alone is the same as __u2205__() i.e., calling the
      nullary function, whether it is defined at the local, lexical, module, or
      built-in scope.

    * Do we support characters from the category 'So' (symbol, other)?  Not
      doing so means preventing programmers from using operators like
      make those kinds of choices for our users?

