
On Sun, 8 Jan 2023 at 08:32, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Steven D'Aprano writes:
On Sat, Jan 07, 2023 at 10:48:48AM -0800, Peter Ludemann wrote:
You can get almost the same result using pattern matching. For example, your "foo:bar;baz".partition(":", ";") can be done by a well-known matching idiom: re.match(r'([^:]*):([^;]*);(.*)', 'foo:bar;baz').groups()
I think that the regex solution is also wrong because it requires you to know *exactly* what order the separators are found in the source string.
But that's characteristic of many examples. In "structured" mail headers like Content-Type, you want the separators to come in the order ':', '=', ';'. In a URI scheme with an authority component, you want them in the order '@', ':'.
+1 (while also recognising the caveats you mention subsequently)
Except that you don't, in both those examples. In Content-Type, the '=' is optional, and there may be multiple ';'. In authority, the existing ':' is optional, and there's an optional ':' to separate password from username before the '@'.
Trying to avoid the usual discussions about permissive parsing / supporting various implementations in-the-wild: long-term, the least ambiguous and most computationally-efficient environment would probably want to reduce special cases like that? (both in-data and in-code)
user, _, domain = "example.com".partition('@')
does the wrong thing!
Yep - it's important to choose partition arguments (I'm mostly-resisting the temptation to call them a 'pattern') that are appropriate for the input. Structural pattern matching _seems_ like it could correspond here, in terms of selecting appropriate arguments -- but it is, as I understand it, limited to at-most-one wildcard pattern per match (by sensible design).
I would prefer "one bite per call" partition to a partition at multiple points.
That does seem clearer - and clearer is, generally, probably better. I suppose an analysis (that I don't have the ability to perform easily) could be to determine how many regular expression codesites could be migrated compatibly and beneficially by using multiple-partition-arguments.