
Steven D'Aprano writes:
I mean, if all you are doing is splitting the source by some separators regardless of order, surely this does the same job and is *vastly* more obvious?
re.split(r'[:;]', 'foo:bar;baz') ['foo', 'bar', 'baz']
"Obvious" yes, but it's also easy to invest that call with semantics (eg, "just three segments because that's the allowed syntax") that it doesn't possess. You haven't stated how many elements it should be split into, nor whether the separator characters are permitted in components, nor whether this component is the whole input and this regexp defines the whole syntax. The point of the "well-known idiom" is to specify most of that (and it doesn't take much much more to specify all of it, specifying "no separators in components" is the most space-consuming part of the expression!) Your other alternatives have the same potential issues.
But that's characteristic of many examples.
Great. Then for *those* structured examples you can happily write your regex and put the separators in the order you expect.
But I'm talking about *unstructured* examples where you don't know the order of the separators, you want to split on whichever one comes first regardless of the order, and you need to know which separator that was.
That's easy enough to do with a (relatively unknown to some ;-) regular expression: re.match("([^;:]*)([;:])(.*)", source) The question is whether the need is frequent enough and that's hard enough to understand / ugly enough to warrant another method or an incompatible extension to str.partition (and str.rpartition).[1]
Examples where the order of separators doesn't matter? In most of the examples I need, swapping order is a parse error.
Okay, then you *mostly* don't need this.
I already knew that. Without real examples, I can't judge whether I'm pro-status quo or pro-serving-the-nonuniversal-but-still-useful-case.
str.partition does *one* three way split, into (head, sep, tail). If you want to continue to partition the tail, you have to call it again.
I'm much more favorable to proposals where str.partition and str.rpartition split at *one* point, but the OP seemed intended to do more work (but not arbitrary amounts!) per call.
I'm not sure I quite understand you there, but if I do, I would prefer to split the string and then validate the head and tail afterwards, rather than just have the regex fail.
For me, often that depends on how hard I'm willing to work to support users. If the only user is myself, that's very often zero. In the case of the "well-known idiom", the only ways the regexp can fail involve wrong number of separators. I'd be willing to impose that burden on users with a "wrong number of separators" message. Another case is where I want an efficient parser for the vast majority of conformant cases and am willing to do redundant work for the error cases. Footnotes: [1] Here "incompatible" means that people writing code that must support previous versions of Python can't use it.