Re: [Python-Dev] Proof of the pudding: str.partition()
Raymond's original definition for partition() did NOT support any of the following: (*) Regular Expressions (*) Ways to generate just 1 or 2 of the 3 values if some are not going to be used (*) Clever use of indices to avoid copying strings (*) Behind-the-scenes tricks to allow repeated re-partitioning to be faster than O(n^2) The absence of these "features" is a GOOD thing. It makes the behavior of partition() so simple and straightforward that it is easily documented and can be instantly understood by a competent programmer. I *like* keeping it simple. In fact, if anything, I'd give UP the one fancy feature he chose to include: (*) An rpartition() function that searches from the right ...except that I understand why he included it and am convinced by the arguments (use cases can be demonstrated and people would expect it to be there and complain if it weren't). Simplicity and elegence are two of the reasons that this is such an excellent proposal, let's not lose them. We have existing tools (like split() and the re module) to handle the tricky problems. -- Michael Chermside
Michael Chermside
(*) An rpartition() function that searches from the right
...except that I understand why he included it and am convinced by the arguments (use cases can be demonstrated and people would expect it to be there and complain if it weren't).
I would think that perhaps an optional second argument to the method that
controls whether it searches from the start (default) or end of the string
might be nicer than having two separate methods, even though that would lose
parallelism with the current .find/.index.
While I'm at it, why not propose that for py3k that
.rfind/.rindex/.rjust/.rsplit disappear, and .find/.index/.just/.split grow an
optional "fromright" (or equivalent) optional keyword argument?
Charles
--
-----------------------------------------------------------------------
Charles Cazabon
On 8/31/05, Charles Cazabon
I would think that perhaps an optional second argument to the method that controls whether it searches from the start (default) or end of the string might be nicer than having two separate methods, even though that would lose parallelism with the current .find/.index.
While I'm at it, why not propose that for py3k that .rfind/.rindex/.rjust/.rsplit disappear, and .find/.index/.just/.split grow an optional "fromright" (or equivalent) optional keyword argument?
This violates one of my design principles: don't add boolean options to an API that control the semantics in such a way that the option value is (nearly) always a constant. Instead, provide two different method names. The motivation for this rule comes partly for performance: parameters are relatively expensive, and you shouldn't make the method test dynamically for a parameter value that is constant for the call site; and partly from readability: don't bother the reader with having to remember the full general functionality and how it is affected by the various flags; also, a Boolean positional argument is a really poor clue about its meaning, and it's easy to misremember the sense reversed. PS. This is a special case of a stronger design principle: don't let the *type* of the return value depend on the *value* of the arguments. PS2. As with all design principles, there are exceptions. But they are, um, exceptional. index/rindex is not such an exception. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
On 8/31/05, Charles Cazabon
wrote: While I'm at it, why not propose that for py3k that .rfind/.rindex/.rjust/.rsplit disappear, and .find/.index/.just/.split grow an optional "fromright" (or equivalent) optional keyword argument?
This violates one of my design principles:
Ah, excellent response. Are your design principles written down anywhere? I didn't see anything on your essays page about them, but I'd like to learn at the feet of the BDFL.
don't add boolean options to an API that control the semantics in such a way that the option value is (nearly) always a constant. Instead, provide two different method names.
Hmmm. I really dislike the additional names, but ...
The motivation for this rule comes partly for performance: parameters are relatively expensive, and you shouldn't make the method test dynamically for a parameter value that is constant for the call site;
I can see this.
and partly from readability: don't bother the reader with having to remember the full general functionality and how it is affected by the various flags;
This I don't think is so bad. It's analogous to providing the "reverse" parameter to sorted et al, and I don't think that's particularly hard to remember. It would also be rarely used; I use find/index tens of times more often than I use rfind/rindex, and I presume it would be the same for a hypothetical .part/.rpart.
also, a Boolean positional argument is a really poor clue about its meaning, and it's easy to misremember the sense reversed.
I totally agree. I therefore borrowed the time machine and modified my proposal to suggest it should be a keyword argument, not a positional one :).
PS. This is a special case of a stronger design principle: don't let the *type* of the return value depend on the *value* of the arguments.
Hmmm. In all of these cases, the type of the return value is constant. Only
the value would change based on the value of the arguments. ... ?
Charles
--
-----------------------------------------------------------------------
Charles Cazabon
Charles Cazabon wrote:
also, a Boolean positional argument is a really poor clue about its meaning, and it's easy to misremember the sense reversed.
I totally agree. I therefore borrowed the time machine and modified my proposal to suggest it should be a keyword argument, not a positional one :).
The best alternative to rpartition I've encountered so far is Reinhold's proposal of a 'separator index' that selects which occurrence of the separator in the string should be used to perform the partitioning. However, even it doesn't measure up, as you will see if you read on. . . The idea is that, rather than "partition(sep)" and "rpartition(sep)", we have a single method "partition(sep, [at_sep=1])". The behaviour could be written up like this: """ Partition splits the string into three pieces (`before`, `sep`, `after`) - the part of the string before the separator, the separator itself and the part of the string after the separator. If the relevant portion of the string doesn't exist, then the corresponding element of the tuple returned is the empty string. The `at_sep` argument determines which occurence of the separator is used to perform the partitioning. The default value of 1 means the partitioning occurs at the 1st occurence of the separator. If the `at_sep` argument is negative, occurences of the separator are counted from the end of the string instead of the start. An `at_sep` value of 0 will result in the original string being returned as the part 'after' the separator. """ A concrete implementation is below. Comparing it to Raymond's examples that use rpartition, I find that the only benefit in these examples is that the use of the optional second argument is far harder to miss than the single additional letter in the method name, particularly if partition and rpartition are used close together. Interestingly, out of 31 examples in Raymond's patch, only 7 used rpartition. The implementation, however, is significantly less obvious than that for the simple version, and likely slower due to the extra conditional, the extra list created, and the need to use join. It also breaks symmetry with index/rindex and split/rsplit. Additionally, if splitting on anything other than the first or last occurence of the separator was going to be a significant use case for str.partition, wouldn't the idea have already come up in the context of str.find and str.index? I actually thought the 'at_sep' argument was a decent idea when I started writing this message, but I have found my arguments in favour of it to be wholly unconvincing, and the arguments against it perfectly sound ;) Cheers, Nick. def partition(s, sep, at_sep=1): """ Returns a three element tuple, (head, sep, tail) where: head + sep + tail == s sep == '' or sep is t bool(sep) == (t in s) # sep indicates if the string was found >>> s = 'http://www.python.org' >>> partition(s, '://') ('http', '://', 'www.python.org') >>> partition(s, '?') ('http://www.python.org', '', '') >>> partition(s, 'http://') ('', 'http://', 'www.python.org') >>> partition(s, 'org') ('http://www.python.', 'org', '') """ if not isinstance(t, basestring) or not t: raise ValueError('partititon argument must be a non-empty string') if at_sep == 0: result = ('', '', s) else: if at_sep > 0: parts = s.split(sep, at_sep) if len(parts) <= at_sep: result = (s, '', '') else: result = (sep.join(parts[:at_sep]), sep, parts[at_sep]) else: parts = s.rsplit(sep, at_sep) if len(parts) <= at_sep: result = ('', '', s) else: result = (parts[0], sep, sep.join(parts[1:])) assert len(result) == 3 assert ''.join(result) == s assert result[1] == '' or result[1] is sep return result import doctest print doctest.testmod() ================================== **** Standard lib comparisons **** ================================== =====CGIHTTPServer.py===== def run_cgi(self): """Execute a CGI script.""" dir, rest = self.cgi_info ! rest, _, query = rest.rpartition('?') ! script, _, rest = rest.partition('/') scriptname = dir + '/' + script scriptfile = self.translate_path(scriptname) if not os.path.exists(scriptfile): def run_cgi(self): """Execute a CGI script.""" dir, rest = self.cgi_info ! rest, _, query = rest.partition('?', at_sep=-1) ! script, _, rest = rest.partition('/') scriptname = dir + '/' + script scriptfile = self.translate_path(scriptname) if not os.path.exists(scriptfile): =====cookielib.py===== else: path_specified = False path = request_path(request) ! head, sep, _ = path.rpartition('/') ! if sep: if version == 0: # Netscape spec parts company from reality here ! path = head else: ! path = head + sep if len(path) == 0: path = "/" else: path_specified = False path = request_path(request) ! head, sep, _ = path.partition('/', at_sep=-1) ! if sep: if version == 0: # Netscape spec parts company from reality here ! path = head else: ! path = head + sep if len(path) == 0: path = "/" =====httplib.py===== def _set_hostport(self, host, port): if port is None: ! host, _, port = host.rpartition(':') ! if ']' not in port: # ipv6 addresses have [...] try: ! port = int(port) except ValueError: ! raise InvalidURL("nonnumeric port: '%s'" % port) else: port = self.default_port if host and host[0] == '[' and host[-1] == ']': def _set_hostport(self, host, port): if port is None: ! host, _, port = host.partition(':', at_sep=-1) ! if ']' not in port: # ipv6 addresses have [...] try: ! port = int(port) except ValueError: ! raise InvalidURL("nonnumeric port: '%s'" % port) else: port = self.default_port if host and host[0] == '[' and host[-1] == ']': =====modulefinder.py===== assert caller is parent self.msgout(4, "determine_parent ->", parent) return parent ! pname, found, _ = pname.rpartition('.') ! if found: parent = self.modules[pname] assert parent.__name__ == pname self.msgout(4, "determine_parent ->", parent) assert caller is parent self.msgout(4, "determine_parent ->", parent) return parent ! pname, found, _ = pname.partition('.', at_sep=-1) ! if found: parent = self.modules[pname] assert parent.__name__ == pname self.msgout(4, "determine_parent ->", parent) =====pdb.py===== filename = None lineno = None cond = None ! arg, found, cond = arg.partition(',') ! if found and arg: # parse stuff after comma: "condition" ! arg = arg.rstrip() ! cond = cond.lstrip() # parse stuff before comma: [filename:]lineno | function funcname = None ! filename, found, arg = arg.rpartition(':') ! if found: ! filename = filename.rstrip() f = self.lookupmodule(filename) if not f: print '*** ', repr(filename), filename = None lineno = None cond = None ! arg, found, cond = arg.partition(',') ! if found and arg: # parse stuff after comma: "condition" ! arg = arg.rstrip() ! cond = cond.lstrip() # parse stuff before comma: [filename:]lineno | function funcname = None ! filename, found, arg = arg.partition(':', at_sep=-1) ! if found: ! filename = filename.rstrip() f = self.lookupmodule(filename) if not f: print '*** ', repr(filename), ***** return if ':' in arg: # Make sure it works for "clear C:\foo\bar.py:12" ! filename, _, arg = arg.rpartition(':') try: lineno = int(arg) except: return if ':' in arg: # Make sure it works for "clear C:\foo\bar.py:12" ! filename, _, arg = arg.partition(':', at_sep=-1) try: lineno = int(arg) except: =====smtplib.py===== """ if not port and (host.find(':') == host.rfind(':')): ! host, found, port = host.rpartition(':') ! if found: try: port = int(port) except ValueError: raise socket.error, "nonnumeric port" """ if not port and (host.find(':') == host.rfind(':')): ! host, found, port = host.partition(':', at_sep=-1) ! if found: try: port = int(port) except ValueError: raise socket.error, "nonnumeric port" -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
While I'm at it, why not propose that for py3k that .rfind/.rindex/.rjust/.rsplit disappear, and .find/.index/.just/.split grow an optional "fromright" (or equivalent) optional keyword argument?
This violates one of my design principles: don't add boolean options to an API that control the semantics in such a way that the option value is (nearly) always a constant. Instead, provide two different method names.
The motivation for this rule comes partly for performance: parameters are relatively expensive, and you shouldn't make the method test dynamically for a parameter value that is constant for the call site; and partly from readability: don't bother the reader with having to remember the full general functionality and how it is affected by the various flags; also, a Boolean positional argument is a really poor clue about its meaning, and it's easy to misremember the sense reversed.
PS. This is a special case of a stronger design principle: don't let the *type* of the return value depend on the *value* of the arguments.
PS2. As with all design principles, there are exceptions. But they are, um, exceptional. index/rindex is not such an exception.
FWIW, after this is over, I'll put together a draft list of these principles. The one listed above has served us well. An early draft of itertools.ifilter() had an invert flag. The toolset improved when that was split to a separate function, ifilterfalse(). Other thoughts: Tim's rule on algorithm selection: We read Knuth so you don't have to. Raymond's rule on language proposals: Assertions that construct X is better than an existing construct Y should be backed up by a variety of side-by-side comparisons using real-world code samples. I'm sure there are plenty more if these in the archives. Raymond
"Raymond Hettinger"
FWIW, after this is over, I'll put together a draft list of these principles. The one listed above has served us well. An early draft of itertools.ifilter() had an invert flag. The toolset improved when that was split to a separate function, ifilterfalse().
Other thoughts:
Tim's rule on algorithm selection: We read Knuth so you don't have to.
Raymond's rule on language proposals: Assertions that construct X is better than an existing construct Y should be backed up by a variety of side-by-side comparisons using real-world code samples.
I'm sure there are plenty more if these in the archives.
This would make a good information PEP to point people to when they ask 'Why ...' and the answer goes back to one of these principles. Terry J. Reedy
On Wed, Aug 31, 2005, Raymond Hettinger wrote:
FWIW, after this is over, I'll put together a draft list of these principles. The one listed above has served us well. An early draft of itertools.ifilter() had an invert flag. The toolset improved when that was split to a separate function, ifilterfalse().
Other thoughts:
Tim's rule on algorithm selection: We read Knuth so you don't have to.
Raymond's rule on language proposals: Assertions that construct X is better than an existing construct Y should be backed up by a variety of side-by-side comparisons using real-world code samples.
I'm sure there are plenty more if these in the archives.
Nice! Also a pointer to the Zen of Python. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ The way to build large Python applications is to componentize and loosely-couple the hell out of everything.
participants (7)
-
Aahz
-
Charles Cazabon
-
Guido van Rossum
-
Michael Chermside
-
Nick Coghlan
-
Raymond Hettinger
-
Terry Reedy