[ python-Bugs-1613130 ] str.split creates new string even if pattern not found
SourceForge.net
noreply at sourceforge.net
Thu Apr 12 11:19:44 CEST 2007
Bugs item #1613130, was opened at 2006-12-11 14:03
Message generated for change (Comment added) made by pitrou
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: Python 2.5
Status: Open
Resolution: None
Priority: 1
Private: No
Submitted By: Antoine Pitrou (pitrou)
Assigned to: Fredrik Lundh (effbot)
Summary: str.split creates new string even if pattern not found
Initial Comment:
Hello,
Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs:
$ python
Python 2.5 (r25:51908, Oct 6 2006, 15:22:41)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "abcde" * 2
>>> id(s)
3084139400L
>>> id(str(s))
3084139400L
>>> id("" + s)
3084139400L
>>> id(s.strip())
3084139400L
>>> id(s.replace("g", "h"))
3084139400L
>>> [id(x) for x in s.partition("h")]
[3084139400L, 3084271768L, 3084271768L]
>>> [id(x) for x in s.split("h")]
[3084139360L]
----------------------------------------------------------------------
>Comment By: Antoine Pitrou (pitrou)
Date: 2007-04-12 11:19
Message:
Logged In: YES
user_id=133955
Originator: YES
Hi,
> Dropping the priority. This pay-off is near zero and likely not worth
the
> cost of making the code more complex than it already is.
No problem!
The more interesting question actually was whether it made any sense to
factor out the split() implementation in "stringlib" so as to share the
implementation between str and unicode.
Also, as for the USE_FAST question you asked on python-dev, I may have an
answer: if you try to enable USE_FAST you'll see that some operations are
indeed faster on large strings (say 100s or 1000s of characters), but they
become slower on small strings because of the larger overhead of the search
algorithm. Thus USE_FAST could negatively impact Python programs which
process a lot of small strings.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2007-04-11 19:09
Message:
Logged In: YES
user_id=80475
Originator: NO
Dropping the priority. This pay-off is near zero and likely not worth the
cost of making the code more complex than it already is.
----------------------------------------------------------------------
Comment By: Georg Brandl (gbrandl)
Date: 2006-12-12 17:21
Message:
Logged In: YES
user_id=849994
Originator: NO
Sounds like this is best assigned to Fredrik.
----------------------------------------------------------------------
Comment By: Antoine Pitrou (pitrou)
Date: 2006-12-12 12:35
Message:
Logged In: YES
user_id=133955
Originator: YES
Ok, I did a patch which partially adds the optimization (the patch is at
home, I can't post it right now). I have a few questions though:
- there is a USE_FAST flag which can bring some speedups when a
multicharacter separator is used; however, it is not enabled by default, is
there a reason for this?
- where and by whom is maintained stringbench.py, so that I can propose
additional tests for it (namely, tests for unmatched split())?
- split() implementation is duplicated between str and unicode (the
unicode versions having less optimizations), would it be useful to
"stringlib'ify" split()?
- rsplit() does quite similar things as split(), has anyone tried to
factor similar parts? do you see any caveats doing so?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470
More information about the Python-bugs-list
mailing list