On 24 Mar 2020, at 2:42, Steven D'Aprano wrote:
On Sun, Mar 22, 2020 at 10:25:28PM -0000, Dennis Sweeney wrote:
Changes: - More complete Python implementation to match what the type checking in the C implementation would be - Clarified that returning ``self`` is an optimization - Added links to past discussions on Python-Ideas and Python-Dev - Specified ability to accept a tuple of strings
I am concerned about that tuple of strings feature. [...] Aside from those questions about the reference implementation, I am concerned about the feature itself. No other string method that returns a modified copy of the string takes a tuple of alternatives.
startswith and endswith do take a tuple of (pre/suff)ixes, but they don't return a modified copy; they just return a True or False flag;
replace does return a modified copy, and only takes a single substring at a time;
find/index/partition/split etc don't accept multiple substrings to search for.
That makes startswith/endswith the unusual ones, and we should be conservative before emulating them.
Actually I would like for other string methods to gain the ability to search for/chop off multiple substrings too.
A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer:
```python def tokenize(source, delimiter): lastpos = 0 while True: pos = source.find(delimiter, lastpos) if pos == -1: token = source[lastpos:].strip() if token: yield token break else: token = source[lastpos:pos].strip() if token: yield token yield source[pos] lastpos = pos + 1
print(list(tokenize(" [ 1, 2, 3] ", ("[", ",", "]")))) ```
This would output `['[', '1', ',', '2', ',', '3', ']']` if `str.find()` supported multiple substring.
Of course to be really usable `find()` would have to return **which** substring was found, which would make the API more complicated (and somewhat incompatible with the existing `find()`).
But for `cutprefix()` (or whatever it's going to be called). I'm +1 on supporting multiple prefixes. For ambiguous cases, IMHO the most straight forward option would be to chop off the first prefix found.