
On Sun, Jan 08, 2023 at 05:30:30PM +0900, Stephen J. Turnbull wrote:
Steven D'Aprano writes:
On Sat, Jan 07, 2023 at 10:48:48AM -0800, Peter Ludemann wrote:
You can get almost the same result using pattern matching. For example, your "foo:bar;baz".partition(":", ";") can be done by a well-known matching idiom: re.match(r'([^:]*):([^;]*);(.*)', 'foo:bar;baz').groups()
"Well-known" he says :-)
It *is* well-known to those who know. Just because you don't like regex doesn't mean it's not well-known.
I like regexes plenty, for what they are good for. But my *liking* them or not is irrelevant as to whether this example is "well-known" or not. I'm not the heaviest regex user in the world, but I've used my share, and I've never seen this particular line noise before. (Hey, I like Forth. Sometimes line noise is great.) I mean, if all you are doing is splitting the source by some separators regardless of order, surely this does the same job and is *vastly* more obvious?
re.split(r'[:;]', 'foo:bar;baz') ['foo', 'bar', 'baz']
If the order matters:
re.match('(.*):(.*);(.*)', 'foo:bar;baz').groups() ('foo', 'bar', 'baz')
Or use non-greedy wildcards if you need them:
re.match('(.*?):(.*?);(.*)', 'foo:b:ar;ba;z').groups() ('foo', 'b:ar', 'ba;z')
I think that the regex solution is also wrong because it requires you to know *exactly* what order the separators are found in the source string.
But that's characteristic of many examples.
Great. Then for *those* structured examples you can happily write your regex and put the separators in the order you expect. But I'm talking about *unstructured* examples where you don't know the order of the separators, you want to split on whichever one comes first regardless of the order, and you need to know which separator that was. [...]
Examples where the order of separators doesn't matter? In most of the examples I need, swapping order is a parse error.
Okay, then you *mostly* don't need this.
and it splits the string all at once instead of one split per call.
So does the original proposal, that's part of the point of it, I think.
str.partition does *one* three way split, into (head, sep, tail). If you want to continue to partition the tail, you have to call it again. To me, that fixed "one bite per call" design is fundamental to partition(). If we wanted an arbitrary number of splits we'd use, um, split() :-) Of course we can debate the pros and cons of each, that's what this thread is for.
Parsing is hard. Both regex and r?partition are best used as low- level tools for tokenizing, and you're asking for trouble if you try to use them for parsing past a certain point.
Right! I agree! And that is why I want partition to accept multiple separators and split on the first one found. I find myself needing to do that, well, not "all the time" by any means, but often enough that its an itch I want scratched.
My breaking point for regex is somewhere around the authority example,
Heh, I've written much more complicated examples. It was kinda fun, until I came back to it a month later and couldn't understand what the hell it did! :-)
but I wouldn't push back if my project's style guide said to to break that up. I *would* however often prefer regexp to r?partition because it would allow character classes, and in most of the areas I work with (mail, URIs, encodings) being able to detect lexical errors by using character classes is helpful.
I'm not sure I quite understand you there, but if I do, I would prefer to split the string and then validate the head and tail afterwards, rather than just have the regex fail.
And I would prefer "one bite per call" partition to a partition at multiple points. Where I'm being pretty fuzzy, the .split methods are fine.
I think we agree here. -- Steve