On 08/01/2023 23.10, James Addison via Python-ideas wrote:
On Sun, 8 Jan 2023 at 03:44, Steven D'Aprano <steve@pearwood.info> wrote:
Keep it nice and simple: provided with multiple separators, `partition` will split the string on the first separator found in the source string.
In other words, `source.partition(a, b, c, d)` will split on a /or/ b /or/ c /or/ d, whichever comes first on the left.
Thanks - that's a valid and similar proposal -- partition-by-any-of -- although it's not the suggestion that I had in mind.
Nor I. The obvious differences between split() and partition() are that split can return a list* of splits, whereas partition only performs the one; and partition returns the 'cause' of the split. * list-len can be limited, but I can't recall ever using such. YMMV! Whereas it might frequently seem unnecessary to return the "sep" in the current/single partition() current-implementation, if multiple separators may be used, advising 'the which' will become fundamental. (and hence earlier illustration/question: does the sep belong with the string forming the left-side of that partition, or the 'right'?)
Roughly speaking, the goal I had in mind was to take an input that contains well-defined delimiters in a known order and to produce a sequence of partitions (and separating delimiters) from that input.
The second observation is that of order. (and where my interpretation may diverge from the OP). Why limit the implementation to the same sequence as the separators are expressed in the method-call? If we ask for partition on "=", ":", thus with "a=b:c": "a", "=", "b", ":", "c", why should it not also work on "a:b=c"? (with obvious variation of the output seps) ie why should the order in which the separator arguments were expressed necessarily imply the same order-of-appearance in the subject-string? Relaxing this would enlarge the use-cases to include situations where we don't know the sequence of the separators in the original-string, in-advance. It would also take care of the situation where one/some of the seps do appear in the string-object and one/some don't, eg the example URL* which may/not include an optional port-number, parameter, or anchor (or more than one thereof) - which can sometimes trip-up our RegEx-favoring colleagues. NB please recall that the current-implementation doesn't throw an exception if the separator is not-found, but returns ( str, '', '', ) * again/still dislike using such as illustration/justification, because so many existing solutions exist ...
(you and dn have also indirectly highlighted a potential problem with the partitioning algorithm: how would "foo?a=b&c" partition when using 'str.partition("?", "#")'? would it return a tuple of length five?)
original-string containing n (assorted) separators function-parameter of m separators result-tuple of 2m + 1 sub-strings NB such formula assumes the 'no order' discussion, above NBB am still wondering if nested-tuples might aid post-processing (see earlier question on this), in which case the 2m+1 would be the result-tuple's flattened character. -- Regards, =dn