[Python-ideas] Re: Multiple arguments to str.partition and bytes.partition

Jan. 8, 2023

      On Sun, Jan 08, 2023 at 05:30:30PM +0900, Stephen J. Turnbull wrote:
...
Steven D'Aprano writes:
...
On Sat, Jan 07, 2023 at 10:48:48AM -0800, Peter Ludemann wrote:
...
You can get almost the same result using pattern matching. For example, your
"foo:bar;baz".partition(":", ";")
can be done by a well-known matching idiom:
re.match(r'([^:]*):([^;]*);(.*)', 'foo:bar;baz').groups()
"Well-known" he says :-)
It *is* well-known to those who know.  Just because you don't like
regex doesn't mean it's not well-known.
I like regexes plenty, for what they are good for. But my *liking* them 
or not is irrelevant as to whether this example is "well-known" or not.

I'm not the heaviest regex user in the world, but I've used my share, 
and I've never seen this particular line noise before. (Hey, I like 
Forth. Sometimes line noise is great.)

I mean, if all you are doing is splitting the source by some separators 
regardless of order, surely this does the same job and is *vastly* more 
obvious?
...
...
...
re.split(r'[:;]', 'foo:bar;baz')
['foo', 'bar', 'baz']
If the order matters:
...
...
...
re.match('(.*):(.*);(.*)', 'foo:bar;baz').groups()
('foo', 'bar', 'baz')
Or use non-greedy wildcards if you need them:
...
...
...
re.match('(.*?):(.*?);(.*)', 'foo:b:ar;ba;z').groups()
('foo', 'b:ar', 'ba;z')
...
...
I think that the regex solution is also wrong because it requires you 
to know *exactly* what order the separators are found in the source 
string.
But that's characteristic of many examples.
Great. Then for *those* structured examples you can happily write your 
regex and put the separators in the order you expect.

But I'm talking about *unstructured* examples where you don't know the 
order of the separators, you want to split on whichever one comes first 
regardless of the order, and you need to know which separator that was.

[...]
...
Examples where the order of separators doesn't matter?  In most of the
examples I need, swapping order is a parse error.
Okay, then you *mostly* don't need this.
...
...
and it splits the string all at once instead of one split per call.
So does the original proposal, that's part of the point of it, I
think.
str.partition does *one* three way split, into (head, sep, tail). If you 
want to continue to partition the tail, you have to call it again. To 
me, that fixed "one bite per call" design is fundamental to partition(). 
If we wanted an arbitrary number of splits we'd use, um, split() :-)

Of course we can debate the pros and cons of each, that's what this 
thread is for.
...
Parsing is hard.  Both regex and r?partition are best used as low-
level tools for tokenizing, and you're asking for trouble if you try
to use them for parsing past a certain point.
Right! I agree! And that is why I want partition to accept multiple 
separators and split on the first one found. I find myself needing to do 
that, well, not "all the time" by any means, but often enough that its 
an itch I want scratched.
...
My breaking point for
regex is somewhere around the authority example,
Heh, I've written much more complicated examples. It was kinda fun, 
until I came back to it a month later and couldn't understand what the 
hell it did! :-)
...
but I wouldn't push
back if my project's style guide said to to break that up.  I *would*
however often prefer regexp to r?partition because it would allow
character classes, and in most of the areas I work with (mail, URIs,
encodings) being able to detect lexical errors by using character
classes is helpful.
I'm not sure I quite understand you there, but if I do, I would prefer 
to split the string and then validate the head and tail afterwards, 
rather than just have the regex fail.
...
And I would prefer "one bite per call" partition
to a partition at multiple points.  Where I'm being pretty fuzzy, the
.split methods are fine.
I think we agree here.

-- 
Steve