
Hello, Some time ago on this list was mentionned swapping sub-strings inside a string. A short while later I had a use case case for that feature; this led me to explore this topic in a general case defined as: * Sub-strings to be swapped must not be single characters. * The string can hold any char or sequence of chars: there is then no safe choice usable as temporary place holder. So that the common scheme s1->temp / s2->s1 / temp->s2 using string.replace() cannot be used. * But the issue of overlapping substrings is not taken into account. I asked for algorithms on the tutor mailing list (see below). In addition to the 'naive', but tricky, step-by-step process, there was: ~ A proposal with regex, 'cheating' with the sub() method (I had some similar method, but Kent's is much smarter than mine). ~ A very clever proposal that splits the string into a list and joins back. My conclusions about this feature are as follows: -0- It seems rarely needed. -1- It is of general use, as opposed to domain-specific. -2- It is much less trivial/obvious than it seems. -3- It maps conceptually to a single, whole, operation. What do you think? Is it worth having it Python? Below exemples of implementation. Note that behind the scene the regex method must perform something analog to the the naive one. Timing produced the following results: ~ For any reason, the regex version's time is very unstable on smaller strings / number of replacements / number of trials (up to ~10000). Time varies from 60% to 120% of the naive version time. ~ On bigger trials (~ 100000) regex and naive versions are about as (in)efficient. ~ The list version is about 5 times faster in all cases. def swapNaive(text, s1, s2): new_text = "" l1, l2 = len(s1), len(s2) while (text.count(s1) > 0) or (text.count(s2) > 0): i1, i2 = text.find(s1), text.find(s2) if i1>=0 and (i2<0 or i1 < i2): new_text += text[:i1] + s2 text = text[i1+l1:] else: new_text += text[:i2] + s1 text = text[i2+l2:] new_text += text return new_text ### proposed by Albert T. Hofkamp ### on the tutor mailing list def swapList(text, s1, s2): pieces = text.split(s1) pieces = [p.replace(s2, s1) for p in pieces] return s2.join(pieces) ### proposed by Kent Johnson ### on the tutor mailing list import re def swapRegex(text, s1, s2): def replace(m): return s2 if m.group()==s1 else s1 matcher = re.compile('%s|%s' % (re.escape(s1), re.escape(s2))) return matcher.sub(replace, text) Denis ------ la vita e estrany

On Mon, Apr 06, 2009, spir wrote:
This seems like an odd requirement. In any case, posting as a recipe seems the way to go. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz

On Mon, Apr 6, 2009 at 6:09 AM, Aahz <aahz@pythoncraft.com> wrote:
I think the meaning of this was sub-strings must not be limited to single characters. There's a more general operation here that I've used on occasion: multiReplace(text, {"old value": "new value", "old2" : "new2", ...}) which replaces all the old values with the new values. In a sense this is a generalized form of translate. Swap is a special case of this. In the case of overlapping strings, the earliest match wins. If two strings share a common prefix, then there needs to be a tiebreaking rule for that: shortest or longest wins. --- Bruce

On Mon, Apr 06, 2009, spir wrote:
This seems like an odd requirement. In any case, posting as a recipe seems the way to go. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz

On Mon, Apr 6, 2009 at 6:09 AM, Aahz <aahz@pythoncraft.com> wrote:
I think the meaning of this was sub-strings must not be limited to single characters. There's a more general operation here that I've used on occasion: multiReplace(text, {"old value": "new value", "old2" : "new2", ...}) which replaces all the old values with the new values. In a sense this is a generalized form of translate. Swap is a special case of this. In the case of overlapping strings, the earliest match wins. If two strings share a common prefix, then there needs to be a tiebreaking rule for that: shortest or longest wins. --- Bruce
participants (4)
-
Aahz
-
Bruce Leban
-
Nick Coghlan
-
spir