[Python-Dev] Proof of the pudding: str.partition()

Raymond Hettinger raymond.hettinger at verizon.net
Thu Sep 1 19:01:15 CEST 2005


[Steve Holden]
> The collective brainpower that's been exercised on this one
enhancement
> already must be phenomenal, but the proposal still isn't perfect.

Sure it is :-)  

It was never intended to replace all string parsing functions, existing
or contemplated.  We still have str.index() so people can do low-level
index tracking, optimization, or whatnot.  Likewise, str.split() and
regexps remain better choices for some apps.

The discussion has centered around the performance cost of returning
three strings when two or fewer are actually used.  

>From that discussion, we can immediately eliminate the center string
case as it is essentially cost-free (it is either an empty string or a
reference to, not a copy of the separator argument).

Another case that is no cause for concern is when one of the substrings
is often, but not always empty.  Consider comments stripping for
example:

    # XXX a real parser would need to skip over # in string literals
    for line in open('demo.py'):
        line, sep, comment = line.partition('#')
        print line

On most lines, the comment string is empty, so no time is lost copying a
long substring that won't be used.  On the lines with a comment, I like
having it because it makes the code easier to debug/maintain (making it
trivial to print, log, or store the comment string).

Similar logic applies to other cases where the presence of a substring
is an all or nothing proposition, such as cgi scripts extracting the
command string when present:

    line, cmdfound, command = line.rpartition('?').

If not found, you've wasted nothing (the command string is empty).  If
found, you've gotten what you were going to slice-out anyway.

Also, there are plenty of use cases that only involve short strings
(parsing urls, file paths, splitting name/value pairs, etc).  The cost
of ignoring a component for these short inputs is small.

That leaves the case where the strings are long and parsing is repeated
with the same separator.  The answer here is to simply NOT use
partition().  Don't write:

    while s:
        line, _, s = s.partition(sep)
        . . .

Instead, you almost always do better with

    for line in s.split(sep):
        . . .

or with re.finditer() if memory consumption is an issue.

Remember, adding partition() doesn't take away anything else you have
now (even if str.find() disappears, you still have str.index()).  Also,
its inclusion does not preclude more specialized methods like
str.before(sep) or str.after(sep) if someone is able to prove their
worth.

What str.partition() does do well is simplify code by encapsulating
several variations of a common multi-step, low-level programming
pattern.  It should be accepted on that basis rather than being rejected
because it doesn't also replace re.finditer() or str.split().

Because there are so many places were partition() is a clear
improvement, I'm not bothered when someone concocts a case where it
isn't the tool of choice.  Accept it for what it is, not what it is not.



Raymond



More information about the Python-Dev mailing list