Your idea adds some extra (constant) time to all string ops, and quite a bit of complexity.
It adds a single if NULL check to each op (about the same cost as PyString_Check).
Complexity (recursive joining) is confined to a single function. Meanwhile, the code for str.__add__() becomes simpler. That's basically the whole show.
There are lots of places where knowledge of string internals is assumed, including 3rd party code using a few macros, all of which would have to be fixed.
Then save it for Py3.0, or not. The idea is to make things easier for the python programmer, beginner or pro. With little effort on the C side, there is an opportunity to be the first dynamic language with O(n) behavior for serial string concatenations -- one less thing to teach, one step towards scalability.
But wait, I think it won't fly at all unless you make ob_sval a pointer to separately allocated memory. Otherwise, how could _autojoin() possibly "fix" the string without allocating the memory for it?
BTW, there is a proof-of-concept demo patch with UserString at: www.python.org/sf/976162
Also, there is an alternative approach of having str.__add__() return a string proxy. This would avoid issues with 3rd party code.
That being said, I didn't miss that you hate the idea, so I'll craft a recipe and drop it :-(