Multiple arguments to str.partition and bytes.partition
Hi folks, I'd like to gather some feedback on a feature suggestion to extend str.partition and bytes.partition to support multiple arguments. Please find the feature described below. # Feature or enhancement The str.partition[1] method -- and similarly the bytes.partition[2] method -- is useful when dividing an input into subcomponents while returning a tuple of deterministic length (currently 3) and potentially retaining the delimiters. The feature described here adds support for multiple positional arguments to those methods. The first positional argument would be applied to the input instance that the method is called from -- no change from existing behaviour. Subsequent positional arguments would be applied to the rightmost component of the previous application of the partition algorithm. So, for example, in pseudocode: ### Single-argument case (matching existing behaviour)
"foo:bar".partition(":") # partition 'foo:bar' using ':' - result = ['foo', ':', 'bar'] # all positional arguments consumed - return result
### Multiple-argument case (added feature)
"foo:bar;baz".partition(":", ";") # partition 'foo:bar;baz' using ':' - result = ['foo', ':', 'bar;baz'] # positional arguments remain; continue # partition 'bar;baz' using ';' - result = ['foo', ':', 'bar', ';', 'baz'] # all positional arguments consumed - return result
# Pitch Multiple-partition-arguments would provide a shorthand / syntactic sugar to abbreviate the process of separating inputs with known-delimeters into components. For example:
login_string = 'username@hostname:2001' username, _, hostname, _, port = login_string.partition('@', ':')
Beneficially for the caller, the number of tuple elements can be determined based on the number of positional arguments. For n arguments, a tuple of length 2n + 1 will be returned. Thank you for any and all feedback. James [1] - https://docs.python.org/3/library/stdtypes.html#str.partition [2] - https://docs.python.org/3/library/stdtypes.html#bytes.partition
On 08/01/2023 00.56, James Addison via Python-ideas wrote:
# Feature or enhancement The str.partition[1] method -- and similarly the bytes.partition[2] method -- is useful when dividing an input into subcomponents while returning a tuple of deterministic length (currently 3) and potentially retaining the delimiters.
The feature described here adds support for multiple positional arguments to those methods.
Like the idea of being able to supply a list of separators. In which case, the return would need to indicate which separator was being applied to which partitioning (see below).
The first positional argument would be applied to the input instance that the method is called from -- no change from existing behaviour.
Subsequent positional arguments would be applied to the rightmost component of the previous application of the partition algorithm.
This part seemed an unnecessary limitation - why imply an order? Why isn't the definition of the first partition: that part of the string up-to (but not including) the first of any of the provided separator-strings?
For example:
login_string = 'username@hostname:2001' username, _, hostname, _, port = login_string.partition('@', ':')
Would be better to provide an example which is not already performed by two path libraries. Also, would like to see how are thinking the resulting construct would/could be processed, ie how to consume the resulting tuple. Might it be better to produce a set of tuple-pairs? result = ( ( 'foo', ':', ), ( 'bar', ';', ), ( 'baz', ), ) or indeed: result = ( ( 'foo', ), ( ':', 'bar', ), ( ';', 'baz', ), ) NB partition() currently returns a tuple of strings, do you intend to also propose output as a list? -- -- Regards, =dn
+1 on the idea of having `partition` and `rpartition` take multiple separators. Keep it nice and simple: provided with multiple separators, `partition` will split the string on the first separator found in the source string. In other words, `source.partition(a, b, c, d)` will split on a /or/ b /or/ c /or/ d, whichever comes first on the left. Here is a proof of concept to give the basic idea: ``` def partition(source, *seps): if len(seps) == 0: raise TypeError('need at least one separator') indices = [(i, sep) for sep in seps if (i:=source.find(sep)) != -1] if indices: pos, sep = min(indices, key=lambda t: t[0]) return (source[:pos], sep, source[pos + len(sep):]) else: return (source, '', '') ``` That is not the most efficient implementation, but it shows the basic concept. Example: >>> partition('abc-def+ghi;klm', ';', '-', '+') ('abc', '-', 'def+ghi;klm') >>> partition('def+ghi;klm', ';', '-', '+') ('def', '+', 'ghi;klm') However there are some complications that need resolving. What if the separators overlap? E.g. we might have '-' and '--' as two separators. We might want to choose the shortest separator, or the longest. That choice should be a keyword-only argument. -- Steve
On Sun, 8 Jan 2023 at 03:44, Steven D'Aprano <steve@pearwood.info> wrote:
Keep it nice and simple: provided with multiple separators, `partition` will split the string on the first separator found in the source string.
In other words, `source.partition(a, b, c, d)` will split on a /or/ b /or/ c /or/ d, whichever comes first on the left.
Thanks - that's a valid and similar proposal -- partition-by-any-of -- although it's not the suggestion that I had in mind. Roughly speaking, the goal I had in mind was to take an input that contains well-defined delimiters in a known order and to produce a sequence of partitions (and separating delimiters) from that input. (you and dn have also indirectly highlighted a potential problem with the partitioning algorithm: how would "foo?a=b&c" partition when using 'str.partition("?", "#")'? would it return a tuple of length five?)
On 8 Jan 2023, at 10:10, James Addison via Python-ideas <python-ideas@python.org> wrote:
On Sun, 8 Jan 2023 at 03:44, Steven D'Aprano <steve@pearwood.info> wrote:
Keep it nice and simple: provided with multiple separators, `partition` will split the string on the first separator found in the source string.
In other words, `source.partition(a, b, c, d)` will split on a /or/ b /or/ c /or/ d, whichever comes first on the left.
Thanks - that's a valid and similar proposal -- partition-by-any-of -- although it's not the suggestion that I had in mind.
Roughly speaking, the goal I had in mind was to take an input that contains well-defined delimiters in a known order and to produce a sequence of partitions (and separating delimiters) from that input.
(you and dn have also indirectly highlighted a potential problem with the partitioning algorithm: how would "foo?a=b&c" partition when using 'str.partition("?", "#")'? would it return a tuple of length five?)
Maybe combine the ideas by allowing a tuple where a string is used. 'a=b'.partition(('=', ':')) => ('a', '=', 'b') 'a:b'.partition(('=', ':')) => ('a', ':', 'b') 'a=b:c'.partition('=', (':',';')) => ('a', '=', b, ':', 'c') 'a=b;c'.partition('=', (':',';')) => ('a', '=', b, ';', 'c') Barry
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6DNT7S... Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, 8 Jan 2023 at 13:20, Barry Scott <barry@barrys-emacs.org> wrote:
Maybe combine the ideas by allowing a tuple where a string is used.
'a=b;c'.partition('=', (':',';')) => ('a', '=', b, ';', 'c')
I like that idea - and it seems to fit neatly with the existing partition contract: the inclusion of the matched-delimiter as an item in the result-tuple would reduce match ambiguity (rationale: imagining a use case where a caller would like to provide multiple delimiters during method invocation, and then subsequently check which delimiter was used at a given match position). Since it's both orthogonal to (does not depend-on, nor is it a dependency-of) multiple-partition-arguments and composible with (can be combined with) multiple-partition-arguments, perhaps it should be advocated-for as a separate proposal? (rationale: that would allow either proposal to advance without delaying the other -- bearing in mind a hopefully-unlikely chance of merge conflicts if they reach release-readiness implementation status in parallel)
On 08/01/2023 17:06, James Addison wrote:
Maybe combine the ideas by allowing a tuple where a string is used. 'a=b;c'.partition('=', (':',';')) => ('a', '=', b, ';', 'c') I like that idea - and it seems to fit neatly with the existing
On Sun, 8 Jan 2023 at 13:20, Barry Scott <barry@barrys-emacs.org> wrote: partition contract: the inclusion of the matched-delimiter as an item in the result-tuple would reduce match ambiguity (rationale: imagining a use case where a caller would like to provide multiple delimiters during method invocation, and then subsequently check which delimiter was used at a given match position).
Since it's both orthogonal to (does not depend-on, nor is it a dependency-of) multiple-partition-arguments and composible with (can be combined with) multiple-partition-arguments, perhaps it should be advocated-for as a separate proposal? (rationale: that would allow either proposal to advance without delaying the other -- bearing in mind a hopefully-unlikely chance of merge conflicts if they reach release-readiness implementation status in parallel)
I'd suggest going with the tuples and extra args in one proposal. It possible that its more likely to be loved with the tuple as that removes an objection. Barry
On 08/01/2023 23.10, James Addison via Python-ideas wrote:
On Sun, 8 Jan 2023 at 03:44, Steven D'Aprano <steve@pearwood.info> wrote:
Keep it nice and simple: provided with multiple separators, `partition` will split the string on the first separator found in the source string.
In other words, `source.partition(a, b, c, d)` will split on a /or/ b /or/ c /or/ d, whichever comes first on the left.
Thanks - that's a valid and similar proposal -- partition-by-any-of -- although it's not the suggestion that I had in mind.
Nor I. The obvious differences between split() and partition() are that split can return a list* of splits, whereas partition only performs the one; and partition returns the 'cause' of the split. * list-len can be limited, but I can't recall ever using such. YMMV! Whereas it might frequently seem unnecessary to return the "sep" in the current/single partition() current-implementation, if multiple separators may be used, advising 'the which' will become fundamental. (and hence earlier illustration/question: does the sep belong with the string forming the left-side of that partition, or the 'right'?)
Roughly speaking, the goal I had in mind was to take an input that contains well-defined delimiters in a known order and to produce a sequence of partitions (and separating delimiters) from that input.
The second observation is that of order. (and where my interpretation may diverge from the OP). Why limit the implementation to the same sequence as the separators are expressed in the method-call? If we ask for partition on "=", ":", thus with "a=b:c": "a", "=", "b", ":", "c", why should it not also work on "a:b=c"? (with obvious variation of the output seps) ie why should the order in which the separator arguments were expressed necessarily imply the same order-of-appearance in the subject-string? Relaxing this would enlarge the use-cases to include situations where we don't know the sequence of the separators in the original-string, in-advance. It would also take care of the situation where one/some of the seps do appear in the string-object and one/some don't, eg the example URL* which may/not include an optional port-number, parameter, or anchor (or more than one thereof) - which can sometimes trip-up our RegEx-favoring colleagues. NB please recall that the current-implementation doesn't throw an exception if the separator is not-found, but returns ( str, '', '', ) * again/still dislike using such as illustration/justification, because so many existing solutions exist ...
(you and dn have also indirectly highlighted a potential problem with the partitioning algorithm: how would "foo?a=b&c" partition when using 'str.partition("?", "#")'? would it return a tuple of length five?)
original-string containing n (assorted) separators function-parameter of m separators result-tuple of 2m + 1 sub-strings NB such formula assumes the 'no order' discussion, above NBB am still wondering if nested-tuples might aid post-processing (see earlier question on this), in which case the 2m+1 would be the result-tuple's flattened character. -- Regards, =dn
On Sun, 8 Jan 2023 at 20:20, dn <PythonList@danceswithmice.info> wrote:
(and hence earlier illustration/question: does the sep belong with the string forming the left-side of that partition, or the 'right'?)
There's no connection implied between each separator and the partitions that surround it in the results. In the username/host case, the '@' in 'user@host' isn't instrinsically linked to either the username or hostname component. (another way to think of it is like a meal-break during a work-day; the meal-break doesn't belong to either the part of the day preceding or the part of the day after the break)
Why limit the implementation to the same sequence as the separators are expressed in the method-call?
ie why should the order in which the separator arguments were expressed necessarily imply the same order-of-appearance in the subject-string?
There are two reasons for this, one consumer-side and one implementation-side: 1. It discourages consumers from attempting to partition strings with ambiguously-ordered delimiters 1. It allows the arguments to be scanned (iterated) exactly-once while the input is scanned (also iterated) exactly-once
On 10/01/2023 01.19, James Addison wrote:
On Sun, 8 Jan 2023 at 20:20, dn <PythonList@danceswithmice.info> wrote:
Herewith a repetition of an earlier repetition of a call for Python examples and use-cases. (to better justify the base-idea, which I support)
(and hence earlier illustration/question: does the sep belong with the string forming the left-side of that partition, or the 'right'?)
There's no connection implied between each separator and the partitions that surround it in the results.
Yet, it is important to justify the proposed "idea" with a full-consideration - not just how an enhanced partition() might work, but how its inputs and outputs might be used, constructed, deconstructed, etc. Consider the power of Python's indexing of collections, extended into slicing. Then, the further expansion into itertools' islice(), pairwise(), and groupby(). (etc) Why? Why not?
In the username/host case, the '@' in 'user@host' isn't instrinsically linked to either the username or hostname component.
Again: the example is weak - not because it fails to make the point, but because existing tools satisfy that need. To progress to a PEP, more and better examples will help promote the case!
(another way to think of it is like a meal-break during a work-day; the meal-break doesn't belong to either the part of the day preceding or the part of the day after the break)
Acceding to my own request for 'Python', please consider: if test1: «A compound statement consists of one or more ‘clauses.’ A clause consists of a header and a ‘suite.’ The clause headers of a particular compound statement are all at the same indentation level. Each clause header begins with a uniquely identifying keyword and ends with a colon.» Thus, the colon relates to the preceding code. NB I was going to mention multiple statements on a physical line, eg: print(x); print(y); print(z) However, there's been a flurry of argument about whether the semi-colon separator begins, ends, or has nothing to do with the expression on either side. Thus, use-cases with different interests... To continue down that 'rabbit hole': if x < y < z: print(x); print(y); print(z) «Also note that the semicolon binds tighter than the colon in this context, so that in the following example, either all or none of the print() calls are executed:» Am not wanting to know the answer, or even to provoke a debate, simply to illustrate that there are multiple ways of looking at things (and thus use-cases) - what binds to what, or binds what to what? - and is the answer different when looking at language syntax compared with a programmer's view more concerned with semantics? Again: am only provoking the OP's thinking towards progressing the "idea"! Web.Ref: https://docs.python.org/3/reference/compound_stmts.html
Why limit the implementation to the same sequence as the separators are expressed in the method-call?
ie why should the order in which the separator arguments were expressed necessarily imply the same order-of-appearance in the subject-string?
There are two reasons for this, one consumer-side and one implementation-side:
1. It discourages consumers from attempting to partition strings with ambiguously-ordered delimiters
(python or pseudo-python) examples? Consider also: "we're all adults here", "with great power comes...", etc What is an "ambiguous order"? What is (logically) wrong with the concept of allowing the coder to break on any and all: colons, question-marks, and/or hash-characters (pound-signs) - regardless of their order of appearance, or repetition, in the subject-string? (I am asking this question!)
1. It allows the arguments to be scanned (iterated) exactly-once while the input is scanned (also iterated) exactly-once
True, but why make that a limit? Might loosening such increase the facility - and the number of use-cases? Consider str.translate( table ) which must iterate in combinatoric-fashion. Consider also float(), split(), strip(), (etc) and the definition of "ASCII whitespace". (doubt any need to say this, but to convince the 'Python-Gods' (and the Python-Community) that this idea has merit, the more use-cases which support the proposal, the more ears that will listen...) -- Regards, =dn
participants (4)
-
Barry Scott
-
dn
-
James Addison
-
Steven D'Aprano