[ python-Bugs-1613130 ] str.split creates new string even if pattern not found

SourceForge.net noreply at sourceforge.net
Tue Dec 12 13:52:47 CET 2006


Bugs item #1613130, was opened at 2006-12-11 14:03
Message generated for change (Settings changed) made by pitrou
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Performance
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Antoine Pitrou (pitrou)
Assigned to: Nobody/Anonymous (nobody)
Summary: str.split creates new string even if pattern not found

Initial Comment:
Hello,

Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs:

$ python
Python 2.5 (r25:51908, Oct  6 2006, 15:22:41) 
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "abcde" * 2
>>> id(s)
3084139400L
>>> id(str(s))
3084139400L
>>> id("" + s)
3084139400L
>>> id(s.strip())
3084139400L
>>> id(s.replace("g", "h"))
3084139400L
>>> [id(x) for x in s.partition("h")]
[3084139400L, 3084271768L, 3084271768L]
>>> [id(x) for x in s.split("h")]
[3084139360L]


----------------------------------------------------------------------

Comment By: Antoine Pitrou (pitrou)
Date: 2006-12-12 12:35

Message:
Logged In: YES 
user_id=133955
Originator: YES

Ok, I did a patch which partially adds the optimization (the patch is at
home, I can't post it right now). I have a few questions though:
 - there is a USE_FAST flag which can bring some speedups when a
multicharacter separator is used; however, it is not enabled by default, is
there a reason for this?
 - where and by whom is maintained stringbench.py, so that I can propose
additional tests for it (namely, tests for unmatched split())?
 - split() implementation is duplicated between str and unicode (the
unicode versions having less optimizations), would it be useful to
"stringlib'ify" split()?
 - rsplit() does quite similar things as split(), has anyone tried to
factor similar parts? do you see any caveats doing so?


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470


More information about the Python-bugs-list mailing list