[Python-ideas] PEP: Dict addition and subtraction

Thu Mar 21 12:14:05 EDT 2019

On Thu, Mar 21, 2019 at 03:16:44PM +0200, Serhiy Storchaka wrote:
> 21.03.19 14:51, Chris Angelico пише:
> >... then, in the interests of productive discussion, could you please
> >explain? What is it about dict addition that makes it harder to
> >understand than other addition?
> 
> Currently the + operator has 2 meanings for builtin types (both are 
> widely used), after adding it for dicts it will have 3 meanings.

Just two meanings? I get at least eight among the builtins:

- int addition;
- float addition;
- complex addition;
- string concatenation;
- list concatenation;
- tuple concatenation;
- bytes concatenation;
- bytearray concatenation.

I suppose if you cover one eye and focus on the "big picture", ignoring 
vital factors like "you can't add a list to a string" and "float 
addition and int addition aren't precisely the same", we might pretend 
that this is just two operations:

- numeric addition;
- sequence concatenation.

But in practice, when reading code, it's usually not enough to know that 
some use of the + operator means "concatenation", you need to know 
*what* is being concatenated. There's no point trying to add a tuple if 
a bytearray is required.

> 3 > 2, is not?

Okay, but how does this make it harder to determine what a piece of code 
using + does? Antoine insists that *if we allow dict addition*, then we 
won't be able to tell what 

    spam + eggs  # for example

does unless we know what spam and eggs are. This is very true. But it is 
*equally true today*, right now, and its been equally true going back to 
Python 1.0 or before.

This proposed change doesn't add any uncertainty that doesn't already 
exist, nor will it make code that is clear today less clear tomorrow.^1

And don't forget that Python allows us to create non-builtin types that 
overload operators. If you don't know what spam and eggs are, you can't 
assume they are builtins. With operator overloading, any operator can 
mean literally anything at all. In practice though, this rarely becomes 
a serious problem.

Is there a significant increase in difficulty between the current 
situation:

    # Is this addition or concatenation or something else?
    spam + eggs

versus the proposed:

    # Is this addition or concatenation or merge or something else?
    spam + eggs

Obviously there's *one more builtin* to consider, but I don't think that 
changes the process of understanding the meaning of the operation.

I think that the problem you and Antoine fear ("dict.__add__ will make 
it harder to read code") requires a process that goes something like 
this:

1. Here's a mysterious "spam + eggs" operation we need to understand.
2. For each operation in ("numeric addition", "concatenation"):
3.     assume + represents that operation;
4.     if we understand the spam+eggs expression now, break

If that's how we read code, then adding one more operation would make it 
harder to understand. We'd have to loop three times, not twice:

2. For each operation in ("numeric addition", "concatenation", "dict merging"):

Three is greater than two, so we may have to do more work to understand 
the code. But I don't think that's how people actually read code. I 
think they do this:

1. Here's a mysterious "spam + eggs" operation we need to understand.
2. Read the code to find out what spam and eggs are.
3. Knowing what they are (tuples, lists, floats, etc) immediately tells 
   you what the plus operator does; at worst, a programmer unfamiliar 
   with the type may need to read the docs.

Adding dict.__add__ doesn't make it any harder to work out what the 
operands spam and eggs are. The process we go through to determine what 
the operands are remains the same:

- if one of operands is a literal, that gives you a strong hint that 
  the other is the same type;
- the names or context may make it clear ("header + text" probably 
  isn't doing numeric addition);
- read back through the code looking for where the variables are 
  defined; etc.

That last bit isn't always easy. People can write obfuscated, complex 
code using poor or misleading names. But allowing dict.__add__ doesn't 
make it more obfuscated or more complex. Usually the naming and context 
will make it clear. Most code is not terrible.

At worst, there will be a transition period where people have a 
momentary surprise:

"Wait, what, these are dicts??? How can you add dicts???"

but then they will read the docs (or ask StackOverflow) and the second 
time they see it, it shouldn't be a surprise.

^1 That's my assertion, but if anyone has a concrete example of actual 
code which is self-evident today but will become ambiguous if this 
proposal goes ahead, please show me!

-- 
Steven