[ python-Bugs-1613130 ] str.split creates new string even if pattern not found

SourceForge.net noreply at sourceforge.net
Thu Apr 12 11:19:44 CEST 2007


Bugs item #1613130, was opened at 2006-12-11 14:03
Message generated for change (Comment added) made by pitrou
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: Python 2.5
Status: Open
Resolution: None
Priority: 1
Private: No
Submitted By: Antoine Pitrou (pitrou)
Assigned to: Fredrik Lundh (effbot)
Summary: str.split creates new string even if pattern not found

Initial Comment:
Hello,

Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs:

$ python
Python 2.5 (r25:51908, Oct  6 2006, 15:22:41) 
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "abcde" * 2
>>> id(s)
3084139400L
>>> id(str(s))
3084139400L
>>> id("" + s)
3084139400L
>>> id(s.strip())
3084139400L
>>> id(s.replace("g", "h"))
3084139400L
>>> [id(x) for x in s.partition("h")]
[3084139400L, 3084271768L, 3084271768L]
>>> [id(x) for x in s.split("h")]
[3084139360L]


----------------------------------------------------------------------

>Comment By: Antoine Pitrou (pitrou)
Date: 2007-04-12 11:19

Message:
Logged In: YES 
user_id=133955
Originator: YES

Hi,

> Dropping the priority.  This pay-off is near zero and likely not worth
the
> cost of making the code more complex than it already is.

No problem!
The more interesting question actually was whether it made any sense to
factor out the split() implementation in "stringlib" so as to share the
implementation between str and unicode.

Also, as for the USE_FAST question you asked on python-dev, I may have an
answer: if you try to enable USE_FAST you'll see that some operations are
indeed faster on large strings (say 100s or 1000s of characters), but they
become slower on small strings because of the larger overhead of the search
algorithm. Thus USE_FAST could negatively impact Python programs which
process a lot of small strings.


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-11 19:09

Message:
Logged In: YES 
user_id=80475
Originator: NO

Dropping the priority.  This pay-off is near zero and likely not worth the
cost of making the code more complex than it already is.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-12-12 17:21

Message:
Logged In: YES 
user_id=849994
Originator: NO

Sounds like this is best assigned to Fredrik.

----------------------------------------------------------------------

Comment By: Antoine Pitrou (pitrou)
Date: 2006-12-12 12:35

Message:
Logged In: YES 
user_id=133955
Originator: YES

Ok, I did a patch which partially adds the optimization (the patch is at
home, I can't post it right now). I have a few questions though:
 - there is a USE_FAST flag which can bring some speedups when a
multicharacter separator is used; however, it is not enabled by default, is
there a reason for this?
 - where and by whom is maintained stringbench.py, so that I can propose
additional tests for it (namely, tests for unmatched split())?
 - split() implementation is duplicated between str and unicode (the
unicode versions having less optimizations), would it be useful to
"stringlib'ify" split()?
 - rsplit() does quite similar things as split(), has anyone tried to
factor similar parts? do you see any caveats doing so?


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470


More information about the Python-bugs-list mailing list