
str.find (and bytes.find) is worse than the alternatives in every way. It should be explicitly deprecated in favour of str.__contains__ and str.index. str.find when used to check for substring is inferior to the in operator. "if sub in s:" is shorter, easier-to-read, and more efficient than "if s.find(sub) != -1:" and is not prone to the error "if s.find(sub):" I have occasionally seen. str.index is better for finding indices in that it supports an idiomatic exception-based API rather than a return-code API. Every usage of str.find should look like "index = s.find(sub); if index == -1: (exception code)", which is an antipattern in Python. This problem is compounded by the fact that the returned value is actually a valid value; consider s = 'bar'--s[s.find('x')] is somewhat surprisingly 'r'. Additionally, the existence of str.find violates the there's-one-way-to-do-it principle. Mike

However, in many cases absence of the string is not an error -- you just need to do something else. So in cases where *if* it's found you need the position, and *if* it isn't found you need to do something else, you'd have to use a try/except block to catch the non-error that is absence. All in all I don't see enough reason to start deprecating find. But perhaps popular lint-like programs could flag likely abuses of find? --Guido On Fri, Jul 15, 2011 at 6:57 AM, Mike Graham <mikegraham@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Fri, Jul 15, 2011 at 10:57 AM, Guido van Rossum <guido@python.org> wrote:
It isn't necessarily an error if the substring is not in the string (though it sometimes is), but it is an exceptional case. Python uses exceptions pretty liberally most places -- it isn't necessarily an error if an iterator is exhausted or if float("4.2 bad user input") is called or if BdbQuit was raised. In these cases, an exception can be perfectly expected to indicate that what happened is different from the information used in a return value. Making a Python user write a try/except block when she wants to handle both the cases "substring is in s" and "substring isn't in s" seems perfectly fine to me and, really, preferable to the if statement required to handle these two cases. The base two cases really are about the same: try: i = s.index(sub) except IndexError: do_something() vs. i = s.find(sub) if i == -1: do_something() But what if I forgot to handle the special case? i = s.index(sub) # An exception is raised right here and I can fix my code vs. i = s.find(sub) # No exception is raised In this second case, I get the value of -1. Later I can use it as an index, use it in a slice, or perform arithmetic on it. This can introduce seemingly-unrelated values later on, making this especially hard to track down. If the failure return code was at least None it would behave more sanely, but at present the failure return code is a perfectly valid value for almost any use. If a programmer is sill averse to using try/except, we can still write if sub in s: i = s.index(sub) else: do_something() Now. we can dredge up some examples where -1 is the actual value someone wants to use. These cases are so rare and so subtle as to make their use so clever I don't really see their existence as an advantage. Additionally, it is unfortunate that we currently have two methods to do the same thing (which isn't even a super-common task) with different APIs. Nothing about the names "find" and "index" really makes clear which is which. This violates the "There should be one-- and preferably only one --obvious way to do it." principle and makes the Python user need to memorize an unnecessary, arbitrary distinction. I would also point out that it was not a contrived case that I mentioned where a beginner introduces a bug by trying "if s.find(sub):" instead of "if sub in s:"; I have really seen people try this several times. Obviously we cannot make many decisions based on new Python programmers' mistakes, it is worth recognizing them. I hope this additional discussion might be able to sway your opinion here. The only advantage to using str.find is that you do not have to use try/except blocks, but in fact you don't have to with str.index either. On the other hand, there are numerous disadvantages--practical, pedagogical, stylistic, and design--to having and using str.find. Mike

On Fri, 15 Jul 2011 12:12:33 -0400 Mike Graham <mikegraham@gmail.com> wrote:
While this would be a very good argument to make if we were currently designing the str API, I don't think the benefits of suppressing str.find outweight the burden of converting old code to use a different idiom. We could choose to write something about it in the documentation, though. Regards Antoine.

On Fri, Jul 15, 2011 at 12:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That could be just as suitable. To be honest, I can't really see too much of a difference between a stronger note about the problems I perceive with str.find in the documentation and a note that uses the word "deprecated". There certainly wouldn't be occasion in the foreseeable future for you actually to remove str.find. Mike

It seems to me that the solution is that you should never use find and use index instead. You could even modify pylint/etc. to flag any use of find as an error. That doesn't mean it should be deprecated so everyone else has to follow you. Deprecated now => removed in the future. While many can agree that warnings in the documentation about common mistakes in using APIs is a good thing, that doesn't translate into consensus that we should remove the API that might be misused. The documentation already says "The find() method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the in operator." There is *no* general solution to the problem of people misusing APIs. Even with index someone can still write if s.index(sub): which probably doesn't do what they were thinking. Should we remove that too? Note that one reason people might prefer find over index is that exceptions constrain how you write the code: try: i = s.index(sub) do lots of stuff with s and i except ValueError: result = 'not found' In this case the try wraps way too much code so it could catch a ValueError in the middle of 'stuff'. Here's correct code: try: i = s.index(sub) do_more = True except: result = 'not found' do_more = False if do_more: do lots of stuff with s and i Do you think that's better than: i = s.find(sub) if i < 0: result = 'not found' else: do lots of stuff with s and i --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com On Fri, Jul 15, 2011 at 9:51 AM, Mike Graham <mikegraham@gmail.com> wrote:

On Fri, Jul 15, 2011 at 1:06 PM, Bruce Leban <bruce@leapyear.org> wrote:
The case of "if s.index(sub):" as a misspelling of "if sub in s:" is not nearly as errorprone as the case of "if s.find(sub):". For one, "s.find(sub)" *sounds* like it might be true if you can find sub in s and not true if sub cannot be found. With str.index, the fact the return value is going to be an index is made clear. For two, str.find can fail silently in more cases than the str.index case can fail silently. For three, I've never see a learner write "if s.index(sub)" though I have seen several mistakenly write the "if s.find(sub)" version.
Are you familiar with the "else" clause of a try/except suite? It already handles writing the "right" version nicely. try: i = s.index(sub) except IndexError: result = 'not found' else: do lots of stuff with s and i
--- Bruce
Mike

On 2011-07-15, at 19:06 , Bruce Leban wrote:
Note that you could replace `do_more` by a use of `i`: i = None try: i = s.index(sub) except ValueError: result = 'not found' if i is not None: do stuff with s and i
There's always the `in` alternative: if sub not in s: result = 'not found' else: do stuff with s and s.index(sub) it's a more fluent interface, and uses less error-prone APIs, though it does search twice through `s`. (an other option would be to build this feature via higher-order functions, but that's not really one of Python's forte)

On Fri, Jul 15, 2011 at 9:12 AM, Mike Graham <mikegraham@gmail.com> wrote:
This smells a bit like uncalled-for religion. Remember that readline() returns an empty string at the end of the file instead of raising an exception, and IMO that makes it the better API. -- --Guido van Rossum (python.org/~guido)

On Fri, Jul 15, 2011 at 12:38 PM, Guido van Rossum <guido@python.org> wrote:
Thanks for this prompt, considerate reply (and the other). I really didn't mean to present an "always use exceptions" dogma here, so much as to claim "Having the programmer use a try/except block isn't a problem," and to say that the fact an exception was raised doesn't necessarily indicate that something went wrong. StopIteration is a strong example that exceptions can often be very expected and normal. Mike

On Fri, Jul 15, 2011 at 12:38 PM, Guido van Rossum <guido@python.org> wrote:
The last time I used file.readline was to pass to tokenize, which is complicated immensely by the logic necessary to support both the iterator protocol and the readline protocol. i.e. it has to check for both StopIteration _and_ the empty string, which makes it harder to use, forces it to contain more redundant information. I'm not swayed by the suggestion that file.readline is beautiful at all: it predates the iterator protocol and should have been subsumed by it. Instead it's another inconsistency in Python that tools have to deal with or ignore completely. Strings don't even bother to be compatible with .find, if you pass in an index that was returned by .find() you might just get the wrong result. It's pretty much only ever useful in the if check, which is in essence identical to the try-except, except it adds the ability to forget to check and produce incorrect behaviour -- this is not a feature. Devin

Am 15.07.2011 19:13, schrieb Devin Jeanpierre:
Sorry, I don't see the immense complication in try: line = readline() except StopIteration: line = '' Of course, if the tokenize API was designed from scratch nowadays, it would probably only accept iterables, and you'd have to make one yourself from a file object using iter(fd.readline, '') cheers, Georg

On 15Jul2011 12:12, Mike Graham <mikegraham@gmail.com> wrote: | On Fri, Jul 15, 2011 at 10:57 AM, Guido van Rossum <guido@python.org> wrote: | > However, in many cases absence of the string is not an error -- you | > just need to do something else. [...] | | It isn't necessarily an error if the substring is not in the string | (though it sometimes is), but it is an exceptional case. No it isn't, IMO. It's simply the _other_ case. | Python uses | exceptions pretty liberally most places -- it isn't necessarily an | error if an iterator is exhausted or if float("4.2 bad user input") is | called or if BdbQuit was raised. In these cases, an exception can be | perfectly expected to indicate that what happened is different from | the information used in a return value. In all the cases you cite the exception indicates failure of the operation: .next() has nothing to "next" to, float is being handed garbage etc. str.find does not have a failure mode, it has string found and string not found. | Making a Python user write a try/except block when she wants to handle | both the cases "substring is in s" and "substring isn't in s" seems | perfectly fine to me and, really, preferable to the if statement | required to handle these two cases. You don't find try/except wordy and opaque? I find "if" more idiomatic most of the time. Not to mention vague: it can often be quite hard to be sure the raised exception came from just the operation you imagine it came from. With str.find there's little scope for vagueness I agree (unless you aren't really using a str, but a duck-type). But plenty of: try: x = foofunc(y) except IndexError, e: ... is subject to uncaught IndexError arbitrarily deep in foofunc's call stack. | The base two cases really are about the same: [... try ... excpt ...] | vs. | | i = s.find(sub) | if i == -1: | do_something() | | But what if I forgot to handle the special case? [...] | In this second case, I get the value of -1. Later I can use it as an | index, use it in a slice, or perform arithmetic on it. This can | introduce seemingly-unrelated values later on, making this especially | hard to track down. I agree it may be a pity that str.find doesn't return None on string not found, which would generally raise an exception on an attempt to use it as a number. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Any company large enough to have a research lab is large enough not to listen to it. - Alan Kay

On Fri, Jul 15, 2011 at 6:50 PM, Cameron Simpson <cs@zip.com.au> wrote:
There are many cases that mean exactly the same thing (0 means the substring was found starting at 0, 1 means the substring was found starting at 1, 2 means the substring was found starting at 2,...), so the other case (substring not contained) we can label special or exceptional. We can recognize such cases by their requiring separate logic.
I think you are way off here. "Error" is in the eye of the beholder. I could say that an iterator has two cases: one where it gives me a value, and the other, that it's exhausted. Nothing exceptional there. (Indeed, the choice of StopIteration not to have "Error" in the name was made for the precise reason this isn't regarded as an error.) Similarly, when I pass user input to float, there are just two normal cases, no failure mode: user entered a number, user didn't enter a number. The distinction between an error and another type of special case is subtle at best.
Not really. If we don't like exceptionhandling, we're using the wrong language.
This is the strength and the flaw with exceptions period. It is a much broader question than the one we are facing here. If you do not like exceptions period or Python's use of relatively few exception types for many occasions, I really don't think we can start the discussion at the level of str.find. If I did manage to have an IndexError propagate through to my SomeDuckType.index method when it shouldn't have the meaning I ascribe it, then this is a bug in my implementation of SomeDuckType. This bug would be very unfortunate because when a user tries to use my code right--catching the IndexError--they will completely squash the offending exception and the source of the bug will be unclear. Unfortunately, str.find is highly prone to such bugs as I've discussed since -1 is a valid index for the string.
Mike

This is an absolute flight of fancy, and I'm sure it's already been rejected in the past (a quick search says Aug. 2009, http://mail.python.org/pipermail/python-ideas/2009-August/thread.html#5576 ), but what about some kind of try/except expression? i = s.index(substr) except ValueError is None I guess my main problem with that is the color of the bikeshed: it's hard to get a good idiomatic way of spelling the except expression. You could use a colon, as proposed in that thread, but it seems to me a colon indicates a new line follows. I would be for making this as simple as possible. No "as", no non-implicit "else", and no nesting of exception types. If you want something fancy, use a statement.

Georg Brandl wrote:
That's because the *value* passed as argument to .index() isn't found. Which error class to use often depends on your view point, so in some cases it may seem natural to you, in others you have a different POV, and it feels wrong :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 16 2011)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sat, Jul 16, 2011 at 7:11 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Yeah, but trying to catch IndexError instead of ValueError is a pretty easy (and understandable) mistake to make, given the correspondence in names and logical train of thought "index() tells me the index where a substring can be found, IndexError is used to denote that a given index doesn't exist, so if the requested index doesn't exist, then this function will throw IndexError". Easily detected by testing, but still unintuitive and annoying. If IndexError had instead been called IndexNotFound, then there would never have been the slightest question as to which exception should have been thrown. Too late to change it now, though. We just all have to learn that, from the str.index point of view, failing to find a substring means there's something *wrong* with one (or both) of the passed in strings rather than the less opinionated "the index you have requested doesn't actually exist". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/15/2011 6:50 PM, Cameron Simpson wrote:
Not finding == failure to find == failure to match an implied re. The question is, how should failure to find be indicated? Out of band with a exception? In band with a special value? (I am here ignoring the design bug that the 'special value' returned by str.find is not really special in Python but is a legal and common index and hence bug-bait.) Python usually chooses one method or the other. The index/find pair is very exceptional in providing both. Why? What is so important or exceptional about this particular function? 'Find the index of a substring in a string' is not the most common operation, at least not for most people. To me, the special-pleading arguments given for find would apply to a hundred other functions. For instance, should we add an alternate constructor for int, say int.nix, that would return None instead of ValueError for 'string not consisting of 1 or more base x digits'. [ 'Nix' here means 'nix, you cannot do that, you get nothing in return'.] As with index/find, the input string either does or does not match a particular re. If we have two ways to indicate 'not match' for one function, why not all others? <Answer 1> Why not? Because we do not *need* the near duplication because mis-formed inputs to int are handled with try--except because that is the way it is done. The duplication of index/find is a historical aberration with no particular justification other than history. Leave it at that. If str.find did not exist, a proposal to add it would be less welcome than int.nix. <Answer 2> Indeed, choice of failure indicator is good, so lets do it right and have it everywhere. Define nix generically as def nix(self, *args, **kwds): try: return self(*args, **kwds) except: return None Make this a builtin and add it as a class or instance method, as appropriate, to appropriate built-in classes. (I am thinking, for instance, that class method for int and instance method for type(lambda:0) should work. See test below.) By exposing it, users could use it too eiher directly or wrapped with classmethod(). Or call the above _nix and define decorators. Then str.find would eventually be deprecated in favor of str.index.nix. A preliminary test: class C(): def __init__(self, f): self.f = f def __call__(self, *args): return self.f(*args) def nix(self, *args, **kwds): try: return self(*args, **kwds) except: return None myint = C(int) print(myint('1'), myint.nix('1'), myint.nix(''), myint.nix('a')) try: myint('') except: print('E caught')
1 1 None None E caught
-- Terry Jan Reedy

On 7/16/2011 2:46 PM, Terry Reedy wrote:
Indeed, negative values such as -1 are standard error/failure return codes for functions that normally return nonnegative ints and that written in statically typed languages without catchable exceptions. In C, for instance, EOF is defined as an implementation-defined negative int and I am sure -1 is used by some. -- Terry Jan Reedy

On 16Jul2011 22:21, Masklinn <masklinn@masklinn.net> wrote: | On 2011-07-16, at 21:52 , Terry Reedy wrote: | > On 7/16/2011 2:46 PM, Terry Reedy wrote: | >> On 7/15/2011 6:50 PM, Cameron Simpson wrote: | >>> str.find does not have a failure mode, it has string found and string | >>> not found. | >> | >> Not finding == failure to find == failure to match an implied re. | > Indeed, negative values such as -1 are standard error/failure | > return codes for functions that normally return nonnegative ints and | > that written in statically typed languages without catchable exceptions | | … or type systems worth using. And that's for those which are | 0-indexed of course, especially for finding sub-sequences. | | And interestingly, the function corresponding to `str.find` in libc | returns `NULL` in case of failure, not −1 (its return value is a | pointer to the first occurrence of the needle in the haysack). Though if you're thinking NULL is equivalent to None (which is often is conceptually), let's remember that in C NULL is just a pointer value; it is a sentinal, but not a different type. So NULL here is in some ways akin to -1 in Python's find return. You still need to test for it; not all platforms will (for example) segfault if NULL is dereferenced. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ There's no need to worry about death, it will not happen in your lifetime. - Raymond Smullyan

Mike Graham wrote:
str.find (and bytes.find) is worse than the alternatives in every way.
I disagree.
Additionally, the existence of str.find violates the there's-one-way-to-do-it principle.
The principle is "One *Obvious* Way To Do It" not *Only* One Way. Sometimes which one is 'obvious' is only clear from reading the documentation or learning the language in some other way -- but that doesn't mean it always the _better_ way. I, for one, have zero interest in losing good functionality because somebody else is misusing a feature. Exceptions are not /always/ the best way to do something. ~Ethan~

Mike Graham wrote:
I disagree.
Just because some people (allegedly) misuse str.find is not a reason to remove it. People misuse all sorts of things. I don't believe that it is valid to compare str.find to str.__contains__ since they do different things for different purposes. Using str.find instead of "in" is not misuse if you actually need an index. Better to do a single walk of the source string: p = s.find(sub) if p >= 0: # do something else: ... than wastefully do two: if sub in s: p = s.index(sub) # do something else: ... Whatever efficiency you might gain in the "substring not found" case, you lose in the "found case". You should only use "sub in s" when you don't care about *where* the substring is, only whether or not it is there. Strings are not dicts, and searching is not necessarily fast. If I'm searching the string twice, I'm doing it wrong. Since str.__contains__ is not a valid replacement for str.find, the only question is, should str.find be deprecated in favour of str.index? I say no. str.find is just too useful and neat, compared to catching an exception, to throw out. And it can be considerably faster. For long strings, the time taken for an unsuccessful search may be dominated by the time to traverse the string, and consequently the two alternatives are pretty close to the same speed:
Catching the exception is only 6% slower than testing for -1. Not much difference, and we probably shouldn't care one way or the other. However, for short strings, the time taken may be dominated by the cost of catching the exception, and so str.find may be significantly faster:
s.index here is nearly three times slower than s.find. (And of course, if the substring is present, index and find should be pretty much identical in speed.)
str.index is better for finding indices in that it supports an idiomatic exception-based API rather than a return-code API.
Being idiomatic is not better merely because it is idiomatic. Rather, what's better becomes idiomatic, rather than the other way around, because people re-use code examples that work well. I expect that in practice str.find is used rather more frequently than str.index, which suggests that at least when it comes to string searching, find is the idiomatic API.
"Every" usage? I don't think so. Another common and valid usage is this pattern: index = s.find(sub) if index >= 0: # do something Written with exception handling it becomes significantly longer, trickier and less obvious for beginners: try: index = s.index(sub) except ValueError: pass else: # do something Note especially that this takes the least interesting case, the "do nothing if not found", and promotes it ahead of the interesting case "do something if found". Now that's an anti-pattern! (Albeit a mild one.) And of course the try...except example is subject to its own conceptual failures. Both of these are subtly, or not-so-subtly, wrong: try: index = s.index(sub) # do something except ValueError: pass try: index = s.index(sub) except ValueError: pass # do something
which is an antipattern in Python.
Why do you think it is an anti-pattern? I don't consider it an anti-pattern. I often wish that lists also had a find method that returned a sentinel instead of raising an exception. (Although I'd probably use None, as the re module does, rather than -1.)
Yes, that's a good argument against the use of -1 for "not found". None would have been better.
Additionally, the existence of str.find violates the there's-one-way-to-do-it principle.
You may be confusing Python with some other language, because there is no such principle in Python. Perhaps you are mistaking it for the Zen, There should be one-- and preferably only one --obvious way to do it. which is a statement requiring the existence of an obvious way, not a prohibition against there being multiple non-obvious ways. In any case, it's far from clear to me that str.index is that obvious way. But then again, I'm not Dutch *wink* -- Steven

On Fri, Jul 15, 2011 at 11:57 PM, Mike Graham <mikegraham@gmail.com> wrote:
As others have noted, the typical usage pattern of: idx = s.find(sub) if idx >= 0: # do something that needs idx is nice and clean and significantly faster than searching the string twice. Universally discouraging find() in favour of index() is questionable, as the above is significantly cleaner than the index based alternative. So, I have a different suggestion (that actually paves the way for the eventual deprecation of find()): Update str.index() to accept an optional sentinel value. If the sentinel argument is not supplied, then a missing substring raises ValueError as it does now. If it is supplied, then a missing substring returns the sentinel value instead of throwing an exception. The above idiom could then be expressed cleanly as: idx = s.index(sub, missing=None) if idx is not None: # do something that needs idx However, this seemingly simple suggestion is complicated by the fact that string methods do not currently accept keyword arguments and index() already accepts two optional positional arguments (for substring searching). Perhaps the more general solution of try/except/value expressions is worth reconsidering. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jul 17, 2011 at 1:35 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Shouldn't need a full PEP, but will likely need at least some discussion on python-dev and some pre- and post-patch microbenchmarks to assess the impact on the speed of string operations (since passing and parsing keyword arguments for C functions *is* slower than only using positional arguments). Compared to making strings Unicode by default, though, it's a pretty minor change :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 15, 2011, at 6:57 AM, Mike Graham wrote:
Unless an API is flat out broken, deprecation is almost always a bad idea. This API has been around for a very long time, so deprecating it will break lots of people's code for almost zero benefit: http://www.google.com/codesearch#search/&q=%5C.find%5C(%20lang:%5Epython$&type=cs Raymond

Raymond Hettinger wrote:
How ironic that the fist hit seems to display the problem Mike is concerned with: position = min(position, len(self.contents)) if hasattr(newChild, 'parent') and newChild.parent != None: # We're 'inserting' an element that's already one # of this object's children. if newChild.parent == self: index = self.find(newChild) if index and index < position: # Furthermore we're moving it further down the # list of this object's children. That means that # when we extract this element, our target index # will jump down one. position = position - 1 I haven't read all the surrounding code to know if this will ever fail, but the whole 'index = ... .find(...); if index and ...' certainly doesn't lend confidence. After all, if you *know* newChild is in self, why not use .index()? ~Ethan~

On Sat, Jul 16, 2011 at 11:46 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Note that this particular code is using BeautifulSoup.PageElement.find, not str.find. There are, however, in the first few pages similar snippets which use the result of str.find without a check for the special case. Mike

Ethan Furman wrote:
Perhaps because the name "find" tells you exactly what the method does, while the name "index" is ambiguous. Does it mean "what is at this index?" or "what index is this at?". I've occasionally seen people mistakingly write mylist.index(i) instead of mylist[i]. Including an experienced Python coder who did know better. In that case, it was just a thinko (like a typo, only in the brain *wink*), but in my opinion, the name "index" is not a good name. Since find and index are equally efficient when the substring is present, there's no speed advantage to choosing one over the other if you know that the substring is present. In my opinion str.find beats str.index for readability so comprehensively that there is no contest -- I would *always* use find if available. In my wishlist for Python 4000 I have: * list.index renamed to list.find * str.find and list.find return None if the argument is not found -- Steven

On Sat, Jul 16, 2011 at 11:26 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I agree that breaking people's code is a bad thing and have not suggested removing str.find. This removal would require a transition like that from Python 2.x to Python 3.x, a move that is not planned and I personally do not ever expect. I appreciate your linking this search, which does indeed does show that str.find is in wide use. However, looking at the first give pages of results, this use seems largely unfortunate—literally the majority of the times str.find is used, I would have used "substring in s" or "s.startswith(substring)". I also see code like "pos = s.find(" ("); if pos + len(" (...)") > self._maxWidth:" which makes me very uncomfortable and which I would have to read in detail to figure out what's happening confidently if I was the maintaining the code. Thanks for the reply, Mike

On Jul 16, 2011, at 10:39 AM, Mike Graham wrote:
I agree that breaking people's code is a bad thing and have not suggested removing str.find.
Deprecation is a step towards removal and it always causes a certain amount of pain. ISTM, the time for any move like this would have been the jump from Python 2 to Python 3 where significant breakage was expected and where transition tools were developed.
I am largely unsympathetic to arguments that are roughly equivalent to "I don't like the way other people write programs". Something akin to the str.find() API has been present in many, many languages for a very long time. For the most part, people seem to be able to use it reasonably well. I find that beginning Python students never seem to have a problem with it. Also keep it mind that startswith() and endswith() were relatively recent additions to Python, so it is no surprise that lots of code uses find() instead of startswith(). Raymond

On Sat, Jul 16, 2011 at 3:07 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Deprecation is a step towards removal and it always causes a certain amount of pain.
ISTM, the time for any move like this would have been the jump from Python 2 to Python 3 where significant breakage was expected and where transition tools were developed.
Because this would break backwards compatibility unnecessarily, I think anyone involved can agree that actual removal could not take place until a special jump akin to the Python 2->Python 3 jump. (It would surprise me if such a jump ever actually took place.)
Reviewing five pages of results, over 2/3 of the uses of str.find could be replaced by str.__contains__. Using sub in s instead of s.find(sub) != -1 is *already* the advice in the official Python documentation, so I do not believe I am making especially personal judgments about the style of the code.
I'm glad your experience with learners here has been more consistent than mine.
Mike

On Jul 16, 2011, at 12:26 PM, Mike Graham wrote:
Reviewing five pages of results, over 2/3 of the uses of str.find could be replaced by str.__contains__.
I think the first hit was from BeautifulSoup which predates the introduction of __contains__ and still runs on both old and new versions of Python. It may be hard to believe, but in the old days (not really so long ago), we didn't have __contains__ or startswith() and yet Python became popular anyway ;-)
I know about that advice. I believe I'm the one who added it ;-) (as well as many other tips in the code modernization PEP). Raymond

I just remembered one other thought on the subject. Usually, when Python introduces a method such as .index() that raises an exception for the not-found case, there are immediate requests for variants that don't raise exceptions: dict.pop(key, default) dict.get(key, defalut) next(iterable, default) getattr(obj, attr, default) re.match() --> None or matchobject People seem to hate wrapping try/except around simple calls. Expect those people to be agitated if you take away str.find(). Raymond

On Sat, Jul 16, 2011 at 4:37 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Then it seems you're guiltier than I of saying what way other people should write their programs. =) On Sat, Jul 16, 2011 at 4:52 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Did you catch Nick's suggestion to enhance str.index to have the same basic API as dict.get/getattr/etc.? I think this might be useful overall by providing a way to get a useful default, which is very often None (as a value in a slice). Mike

On Jul 16, 2011, at 2:04 PM, Mike Graham wrote:
Did you catch Nick's suggestion to enhance str.index to have the same basic API as dict.get/getattr/etc.?
Sorry, but I thought idea that was way off base. Things like dict.get and getattr are about returning values, so it is possible to provide a meaningful default. In the case of string.find, there is no meaningful default position in the string. So, a default would simply be a way to turn the -1 value into some other object which you would still need to test. We don't need to make the API worse by expanding it just for the sake of change. The str.find() method isn't broken or useless. If you want people to change the way they write code, it would be better to do it through education (blog posts, pylint, etc) rather than by breaking a venerable API. Raymond

On Sat, Jul 16, 2011 at 6:05 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Slicing was brought up as a use-case where you can use a default without checking. mystring[:mystring.index('#', None)], for example, could conceivably be used somewhere to strip comments from (some sort of) code, for example. It does have other benefits too. It makes the return value on failure explicit, which would help remind people to check, or be more immediately aware when reading code. And it does have nice parallels to those other methods. Devin

On Jul 16, 2011, at 3:15 PM, Devin Jeanpierre wrote:
Slicing was brought up as a use-case where you can use a default without checking.
This thread has lost contact with reality. It started with the reasonable observation that many uses of str.find could be replaced with a simple test using the in-operator. Now, the thread is venturing into the typical python-ideas world of making-up random use cases and APIs. Something like str.find() is in many languages and it is definitely not the norm for them to have found a need to both be able to return -1 or to supply a default value. As a Python teacher, speaker, and consultant, I have the opportunity to see and review the code repositories for many companies. I'm pretty sure that I've never seen a utils module with the likes of: def myfind(fullstring, substring, default=None): i = fullstring.find(substring) if i == -1 and default is not None: return default return i When I start seeing people routinely using a helper function like this, I'll start to believe that a str.find default value isn't silly. Mike's initial post was well grounded in observations about code that could be improved by using "in" or str.index() instead of str.find(). Though I disagreed with the recommendation to deprecate, it would be even worse to exacerbate the usability issues by making the method signature even more complex (with a new optional argument and a new signature variant for the return value). That isn't progress. It's aspiring cruft that makes the language harder to learn and remember. Raymond

On 16Jul2011 15:52, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: | On Jul 16, 2011, at 3:15 PM, Devin Jeanpierre wrote: | > Slicing was brought up as a use-case where you can use a default | > without checking. | | This thread has lost contact with reality. It started with the | reasonable observation that many uses of str.find could | be replaced with a simple test using the in-operator. To be fair here, Mike's OP also mentioned that -1 is easy to misuse if not checked because it it still numeric. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Of course, I realize that the rain in the UK is much wetter than the rain we get here... - Eric Gunnerson <gunnerso@halcyon.com>

Raymond Hettinger wrote:
Nick's proposal was to enhance str.index(), not string.find(); Having str.index() accept a value to return on failure instead of raising an exception means it could do both jobs, and would also make it much less likely to wrongly use the failure return value of -1 from str.find() which is, unfortunately, a legitimate index value. ~Ethan~

On Sun, Jul 17, 2011 at 11:40 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Indeed, the problem as I see it is that our general idiom for functions and methods that raise 'Not Found' exceptions is to accept an optional parameter that specifies a value to return in the Not Found case. For historical reasons, we currently break that idiom for index() methods: instead of supplying an extra parameter to str.index, one instead switches to a completely different method (.find()) with no control over the sentinel value returned (it's always -1). For other sequences (e.g. list), there's no find equivalent, so you *have* to write the exception handling out explicitly. My proposal is to update the signature of index() (for all sequences, including the ABC) to follow the standard 'Not Found' idiom by accepting a 'missing' parameter that is returned for those cases where ValueError would otherwise be raised. Code that uses str.find would continue to work, but the recommended alternative would be obj.index(x, missing=None) (or appropriate default value). I would advise against any actual deprecation of str,find (cf. the deliberate lack of optparse deprecation). It's unfortunate that backwards compatibility means we can't use the more descriptive name, but that's life. However, I already have too much on my plate to push this forward for Python 3.3. I'm able to offer advice if someone would like to try their hand at writing a PEP, though. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 17, 2011, at 12:15 AM, Nick Coghlan wrote:
There's a difference between methods that return looked-up values (where a default might make sense) versus a method that returns an index (where it usually makes no sense at all).
If someone takes this out of python-ideas land and into a serious PEP, they should be prepared to answer a number of tough questions: * Is this actually necessary? Is there something you currently can't code? If not, then it adds API complexity without adding any new capabilities. There is a high threshold for expanding the string API -- this would affect everyone learning python, every book written, every lint tool, every class seeking to be string-like, etc. So, it would need be a substantive improvement to be accepted. * Take a look at what other languages do. Practically every general purpose language has an API for doing substring searches. Since we're not blazing new territory here, there needs to be a good precedent for this change (no shooting from the hip when the problem has already been well solved many times over). * Use Google's code search to identify examples of real world code that would better with the new API. If the only use case is creating a new slicing one-liner, that likely is too rare and arcane to warrant a change. * Consider the effects of adding a second-way-to-do-it. Will it add to the learning curve, cause debates about the best way in a given situation, add more PEP 8 entries and pylint checks? Is it worth introducing version incompatibilities (i.e. runs on 3.3 but not earlier), etc. * What should the default value be? Is there any non-numerical result that ever makes sense; otherwise, you're just making a alias for the -1 currently returned by str.find(). If the default is some value that evaluates to False, will that create a common error where an if-test fails to disambiguate the default value from a substring found at position zero. If the new API is ambiguous or confusing in *any* way, then it will be a step backwards and make Python worse rather than better. * See if you can find examples where people have already found the need to write a helper function such as: def index_default(s, sub, default): try: return s.index(sub) except ValueError: return default If you find code like that in the wild, it may be an indication that people want this. If you don't, it may indicate otherwise. * Good API design requires some thinking about function/method signatures. Would making this a keyword-only argument soive the positional arguments problem? Since str.index() already takes arguments for the "start" and "end" index, is the full signature readable without keywords: mystr.index(possible_substr, 0, -1, default_value) Also look at the signature for the return value. Currently, it always returns a number, but if it can return a number or anything else, then all client code must be prepared to handle the alternatives with clean looking code that is self-evidently correct. * Perhaps talk to some people who write python code for a living to determine if they've ever needed this or whether it would end-up as cruft. (In my case, the answer is that I've not needed or wanted this in a over a decade of heavy Python use). Hopefully, this short and incomplete list will provide a good basis for thinking about whether the proposal is a good idea. Defending a PEP is no fun at all, so put in all your deep thinking up front. Cheers, Raymond

On 2011-07-17, at 10:09 , Raymond Hettinger wrote:
SML even returns the length of the string. See a listing at http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(string_fun... The most common behavior on the page does seem to be returning a numerical sentinel. On the other hand, I'm not sure how many of these languages return a sentinel value which is also a valid index.

On Jul 17, 2011, at 2:13 AM, Masklinn wrote:
There is a fundamental difference between content (values in the list) and the list position. There are meaningful search defaults for the former but not the latter. It's important that to grasp this distinction before going further.
My reading is that not a single one of these entries has a signature with a user specifiable default value of arbitrary type. IOW, there is nothing resembling proposal on the table. AFAICT, all of them are equivalent to either our current str.find() or str.index(). Raymond

Masklinn wrote:
No. In context, Raymond is not talking about values as arbitrary objects. He is talking specifically about values of a collection. E.g. given: mylist = [23, 42, 100] the values Raymond is talking about are 23, 42 and 100, *not* 0, 1, 2 (the indexes of the list) or 3 (the length of the list) or 165 (the sum of the list) or any other arbitrary value.
I can't really see that they do "everything and their reverse". There are two basic strategies: return an out-of-bound value, and raise an exception, both of which Python already does. Out-of-bound values are usually one smaller than the lowest valid index (0 or -1) or one higher than the highest valid index (length of the string, or more greater than the length of the string). A couple of languages return False, which is inappropriate for Python on account of False equaling 0. Some return a dedicated "Not Found" special value, but Python doesn't go in for a proliferation of special constants. A couple of languages, including Ruby, return the equivalent of Python's None. Notably missing is anything like the ability for the caller to specify what index to return if the sub-string is missing. Ask yourself, can you imagine needing mydict.get(key, 1) or mydict.get(key, set())? I expect that you can easily think of reasons why this would be useful. The usefulness of being able to set the return value of failed lookups like dicts is obvious. I wish lists also had a similar get method, and I bet that everybody reading this, even if they disagree that it should be built-in, can see the value of it as a utility function. But can you think of a realistic scenario where you might want to call mystring.find(substr, missing=1)? Why would you want "substring not present" and "substring found at index 1" to both return the same thing? How about mystring.find(substr, missing=set())? If you can't imagine a realistic scenario where you would want such a feature, then you probably don't need this proposed feature.
The first table on the page says that four languages accept negative indexes: Python, Ruby, Perl and Lua. Perl and Python return -1 on not found; Ruby and Lua return nil. -- Steven

On Sun, Jul 17, 2011 at 8:56 PM, Steven D'Aprano <steve@pearwood.info> wrote:
In a language that accepts negative indices, -1 is not out of bounds, and hence is an objectively bad value to return. If str.find() returned None for missing substrings, we wouldn't be having this discussion. However, backwards compatibility requirements mean that option is not available to us. Is this a language wart where the correct answer is to tell everyone new to the language that complains about the current behaviour to "suck it up and learn to live with it"? Maybe. But claiming that str.find() is a *good* API in a language that accepts negative indices would be flat out wrong. Good *enough*, maybe, but not good in an absolute sense. As I see it, there are a few options. 1. Do nothing. Quite a reasonable option. str.find() is flawed and str.index() can be annoying to use, but fixing this may not be worth the hassle. However, this is not the same as claiming the str.find() behaviour is a good API - it's just acknowledging the wart and deciding not to address it. 2. Add an alternate behaviour to str.index() that allows the exception to optionally be converted into a sentinel value. Inspired by getattr, getitem, dict.get, next, etc. Made messy by the need for the sentinel to go after the existing arguments and the fact that the number of sane sentinel values is extremely limited (aside from None, the only remotely useful possibilities I can think of are 0 and len(seq), and even those only hypothetically). This solution requires changing a builtin method API, as well as adding keyword argument support to CPython string methods. Also raises the question of whether or not the Sequence ABC (and sequences in the stdlib) should be adjusted accordingly. 3. Search for a more general solution that simplifies the following common try/except pattern into a single expression (or simple statement): try: val = f() except RelevantException: val = default # further operations using val This has been attempted before, but never successfully as it isn't *that* common (so ugly solutions are never going to be accepted as better than the status quo) and it's extremely difficult to come up with a syntax that neatly captures the 3 necessary elements and can still be handled easily by the parser. Since I have no reason to believe 3 will get any further than it has in that past, making an explicit decision between 1 and 2 is the reason I'd like to see a PEP. PEPs aren't just about getting new features into the language - the Rejected ones are also about documenting the reasons we have chosen *not* to do certain things (and the Deferred ones point out that some things are just hard to do in a way that provides a net benefit to the language). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Raymond Hettinger wrote:
We are not talking about a default value to return -- the default will still be the behavior of raising a ValueError if the substring is not found. Consider the proposed signature: _sentinal = object() class str(): def index(substring, start, end, missing=_sentinal): # looks for string .... # string not found -- now what? if missing is _sentinal: raise ValueError('...') else: return missing The addition is that *if* the caller specifies an object for missing, return that value, *otherwise* raise ValueError just like we do now.
Hmmm -- okay, perhaps we are... let me say, then, that I agree having a default return is not the way to go; this would break everything that expects .index() to exception out if the substring is not found -- in other words, everything that uses .index(). My take on the idea is to have the new 'missing' argument be optional, and if not specified then current behavior is unchanged, but if specified then that value is returned instead.
Trying to be string-like at the moment is such a PITA I really don't see this tiny extra bit as a serious burden. Consider this nice simple code: class MyStr(str): def find(substr, start=None, end=None): # whatever extra I want to do before passing off to str # now pass off to str return str.find(substr, start, end) Too bad it doesn't work: --> test = MyStr('this is a test') --> test.find('is') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in find TypeError: slice indices must be integers or None or have an __index__ method (Yes, this was fixed in 2.6, and as soon as I'm willing to drop support for earlier versions I can remove the following boilerplate: start = start or 0 end = end or len(self) and yes, I wouldn't be able to use the new .index(..., missing=_whatever) for a while, but that doesn't mean we should stop improving the language.)
Why not? 'Well solved' does not mean there is no room for improvement. And going through the whole PEP process does not feel like 'shooting from the hip'.
You mean like 'runs on 2.6+ but not earlier'?
* What should the default value be?
There should be no default value, in my opinion.
The most effective argument by far, IMO, both for not having a default value, and for being very careful about what the caller chooses to use for the missing argument. I think a bomb would be appropriate here: class Bomb(): 'singleton object: blows up on any usage' def __bool__(self): raise OopsError('yell at the programmer!") etc then in usage it's a check for object identity, anything else reminds somebody they forgot to do something.
Many good points -- thank you for taking the time. ~Ethan~

Raymond Hettinger wrote:
Indeed, because for simple things, you don't want to clutter up your code with lots of nested try-excepts. It also has to do with performance: exceptions should only be used for exceptional situations. Not finding a sub-string in some line read from a log file is not an exceptional situation. In fact, it's most likely the common case. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 18 2011)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Mike Graham wrote:
I think that's a disingenuous thing to say. You started this thread with an explicit call to deprecate str.find -- see your chosen subject line, and your first paragraph in this thread states: "It should be explicitly deprecated ..." What is the point of deprecating something if you don't intend to eventually remove it?
A lot of very old code predates startswith and endswith. They only appeared in 2.0. Surprisingly, as late as 2.2, we were limited to testing for a single character: [steve@sylar src]$ python2.2 Python 2.2.3 (#1, Aug 12 2010, 01:08:27) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Some of the code on that first page supports Python 2.1.
Whereas "pos = s.index(" ("); if pos + len(" (...)") > self._maxWidth:" is the height of readability, yes? People can write bad code no matter what tools you give them. -- Steven

On Sat, 2011-07-16 at 08:26 -0700, Raymond Hettinger wrote:
Although a quick lock over the found snippets tells, that apparently many uses of find are indeed "ugly" code that could be improved by use of index or __contains__ Since a DeprecationWarning seems rather intrusive, what would be a less intrusive way to cause code-enhancement in such places? -- Ronny

However, in many cases absence of the string is not an error -- you just need to do something else. So in cases where *if* it's found you need the position, and *if* it isn't found you need to do something else, you'd have to use a try/except block to catch the non-error that is absence. All in all I don't see enough reason to start deprecating find. But perhaps popular lint-like programs could flag likely abuses of find? --Guido On Fri, Jul 15, 2011 at 6:57 AM, Mike Graham <mikegraham@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Fri, Jul 15, 2011 at 10:57 AM, Guido van Rossum <guido@python.org> wrote:
It isn't necessarily an error if the substring is not in the string (though it sometimes is), but it is an exceptional case. Python uses exceptions pretty liberally most places -- it isn't necessarily an error if an iterator is exhausted or if float("4.2 bad user input") is called or if BdbQuit was raised. In these cases, an exception can be perfectly expected to indicate that what happened is different from the information used in a return value. Making a Python user write a try/except block when she wants to handle both the cases "substring is in s" and "substring isn't in s" seems perfectly fine to me and, really, preferable to the if statement required to handle these two cases. The base two cases really are about the same: try: i = s.index(sub) except IndexError: do_something() vs. i = s.find(sub) if i == -1: do_something() But what if I forgot to handle the special case? i = s.index(sub) # An exception is raised right here and I can fix my code vs. i = s.find(sub) # No exception is raised In this second case, I get the value of -1. Later I can use it as an index, use it in a slice, or perform arithmetic on it. This can introduce seemingly-unrelated values later on, making this especially hard to track down. If the failure return code was at least None it would behave more sanely, but at present the failure return code is a perfectly valid value for almost any use. If a programmer is sill averse to using try/except, we can still write if sub in s: i = s.index(sub) else: do_something() Now. we can dredge up some examples where -1 is the actual value someone wants to use. These cases are so rare and so subtle as to make their use so clever I don't really see their existence as an advantage. Additionally, it is unfortunate that we currently have two methods to do the same thing (which isn't even a super-common task) with different APIs. Nothing about the names "find" and "index" really makes clear which is which. This violates the "There should be one-- and preferably only one --obvious way to do it." principle and makes the Python user need to memorize an unnecessary, arbitrary distinction. I would also point out that it was not a contrived case that I mentioned where a beginner introduces a bug by trying "if s.find(sub):" instead of "if sub in s:"; I have really seen people try this several times. Obviously we cannot make many decisions based on new Python programmers' mistakes, it is worth recognizing them. I hope this additional discussion might be able to sway your opinion here. The only advantage to using str.find is that you do not have to use try/except blocks, but in fact you don't have to with str.index either. On the other hand, there are numerous disadvantages--practical, pedagogical, stylistic, and design--to having and using str.find. Mike

On Fri, 15 Jul 2011 12:12:33 -0400 Mike Graham <mikegraham@gmail.com> wrote:
While this would be a very good argument to make if we were currently designing the str API, I don't think the benefits of suppressing str.find outweight the burden of converting old code to use a different idiom. We could choose to write something about it in the documentation, though. Regards Antoine.

On Fri, Jul 15, 2011 at 12:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That could be just as suitable. To be honest, I can't really see too much of a difference between a stronger note about the problems I perceive with str.find in the documentation and a note that uses the word "deprecated". There certainly wouldn't be occasion in the foreseeable future for you actually to remove str.find. Mike

It seems to me that the solution is that you should never use find and use index instead. You could even modify pylint/etc. to flag any use of find as an error. That doesn't mean it should be deprecated so everyone else has to follow you. Deprecated now => removed in the future. While many can agree that warnings in the documentation about common mistakes in using APIs is a good thing, that doesn't translate into consensus that we should remove the API that might be misused. The documentation already says "The find() method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the in operator." There is *no* general solution to the problem of people misusing APIs. Even with index someone can still write if s.index(sub): which probably doesn't do what they were thinking. Should we remove that too? Note that one reason people might prefer find over index is that exceptions constrain how you write the code: try: i = s.index(sub) do lots of stuff with s and i except ValueError: result = 'not found' In this case the try wraps way too much code so it could catch a ValueError in the middle of 'stuff'. Here's correct code: try: i = s.index(sub) do_more = True except: result = 'not found' do_more = False if do_more: do lots of stuff with s and i Do you think that's better than: i = s.find(sub) if i < 0: result = 'not found' else: do lots of stuff with s and i --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com On Fri, Jul 15, 2011 at 9:51 AM, Mike Graham <mikegraham@gmail.com> wrote:

On Fri, Jul 15, 2011 at 1:06 PM, Bruce Leban <bruce@leapyear.org> wrote:
The case of "if s.index(sub):" as a misspelling of "if sub in s:" is not nearly as errorprone as the case of "if s.find(sub):". For one, "s.find(sub)" *sounds* like it might be true if you can find sub in s and not true if sub cannot be found. With str.index, the fact the return value is going to be an index is made clear. For two, str.find can fail silently in more cases than the str.index case can fail silently. For three, I've never see a learner write "if s.index(sub)" though I have seen several mistakenly write the "if s.find(sub)" version.
Are you familiar with the "else" clause of a try/except suite? It already handles writing the "right" version nicely. try: i = s.index(sub) except IndexError: result = 'not found' else: do lots of stuff with s and i
--- Bruce
Mike

On 2011-07-15, at 19:06 , Bruce Leban wrote:
Note that you could replace `do_more` by a use of `i`: i = None try: i = s.index(sub) except ValueError: result = 'not found' if i is not None: do stuff with s and i
There's always the `in` alternative: if sub not in s: result = 'not found' else: do stuff with s and s.index(sub) it's a more fluent interface, and uses less error-prone APIs, though it does search twice through `s`. (an other option would be to build this feature via higher-order functions, but that's not really one of Python's forte)

On Fri, Jul 15, 2011 at 9:12 AM, Mike Graham <mikegraham@gmail.com> wrote:
This smells a bit like uncalled-for religion. Remember that readline() returns an empty string at the end of the file instead of raising an exception, and IMO that makes it the better API. -- --Guido van Rossum (python.org/~guido)

On Fri, Jul 15, 2011 at 12:38 PM, Guido van Rossum <guido@python.org> wrote:
Thanks for this prompt, considerate reply (and the other). I really didn't mean to present an "always use exceptions" dogma here, so much as to claim "Having the programmer use a try/except block isn't a problem," and to say that the fact an exception was raised doesn't necessarily indicate that something went wrong. StopIteration is a strong example that exceptions can often be very expected and normal. Mike

On Fri, Jul 15, 2011 at 12:38 PM, Guido van Rossum <guido@python.org> wrote:
The last time I used file.readline was to pass to tokenize, which is complicated immensely by the logic necessary to support both the iterator protocol and the readline protocol. i.e. it has to check for both StopIteration _and_ the empty string, which makes it harder to use, forces it to contain more redundant information. I'm not swayed by the suggestion that file.readline is beautiful at all: it predates the iterator protocol and should have been subsumed by it. Instead it's another inconsistency in Python that tools have to deal with or ignore completely. Strings don't even bother to be compatible with .find, if you pass in an index that was returned by .find() you might just get the wrong result. It's pretty much only ever useful in the if check, which is in essence identical to the try-except, except it adds the ability to forget to check and produce incorrect behaviour -- this is not a feature. Devin

Am 15.07.2011 19:13, schrieb Devin Jeanpierre:
Sorry, I don't see the immense complication in try: line = readline() except StopIteration: line = '' Of course, if the tokenize API was designed from scratch nowadays, it would probably only accept iterables, and you'd have to make one yourself from a file object using iter(fd.readline, '') cheers, Georg

On 15Jul2011 12:12, Mike Graham <mikegraham@gmail.com> wrote: | On Fri, Jul 15, 2011 at 10:57 AM, Guido van Rossum <guido@python.org> wrote: | > However, in many cases absence of the string is not an error -- you | > just need to do something else. [...] | | It isn't necessarily an error if the substring is not in the string | (though it sometimes is), but it is an exceptional case. No it isn't, IMO. It's simply the _other_ case. | Python uses | exceptions pretty liberally most places -- it isn't necessarily an | error if an iterator is exhausted or if float("4.2 bad user input") is | called or if BdbQuit was raised. In these cases, an exception can be | perfectly expected to indicate that what happened is different from | the information used in a return value. In all the cases you cite the exception indicates failure of the operation: .next() has nothing to "next" to, float is being handed garbage etc. str.find does not have a failure mode, it has string found and string not found. | Making a Python user write a try/except block when she wants to handle | both the cases "substring is in s" and "substring isn't in s" seems | perfectly fine to me and, really, preferable to the if statement | required to handle these two cases. You don't find try/except wordy and opaque? I find "if" more idiomatic most of the time. Not to mention vague: it can often be quite hard to be sure the raised exception came from just the operation you imagine it came from. With str.find there's little scope for vagueness I agree (unless you aren't really using a str, but a duck-type). But plenty of: try: x = foofunc(y) except IndexError, e: ... is subject to uncaught IndexError arbitrarily deep in foofunc's call stack. | The base two cases really are about the same: [... try ... excpt ...] | vs. | | i = s.find(sub) | if i == -1: | do_something() | | But what if I forgot to handle the special case? [...] | In this second case, I get the value of -1. Later I can use it as an | index, use it in a slice, or perform arithmetic on it. This can | introduce seemingly-unrelated values later on, making this especially | hard to track down. I agree it may be a pity that str.find doesn't return None on string not found, which would generally raise an exception on an attempt to use it as a number. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Any company large enough to have a research lab is large enough not to listen to it. - Alan Kay

On Fri, Jul 15, 2011 at 6:50 PM, Cameron Simpson <cs@zip.com.au> wrote:
There are many cases that mean exactly the same thing (0 means the substring was found starting at 0, 1 means the substring was found starting at 1, 2 means the substring was found starting at 2,...), so the other case (substring not contained) we can label special or exceptional. We can recognize such cases by their requiring separate logic.
I think you are way off here. "Error" is in the eye of the beholder. I could say that an iterator has two cases: one where it gives me a value, and the other, that it's exhausted. Nothing exceptional there. (Indeed, the choice of StopIteration not to have "Error" in the name was made for the precise reason this isn't regarded as an error.) Similarly, when I pass user input to float, there are just two normal cases, no failure mode: user entered a number, user didn't enter a number. The distinction between an error and another type of special case is subtle at best.
Not really. If we don't like exceptionhandling, we're using the wrong language.
This is the strength and the flaw with exceptions period. It is a much broader question than the one we are facing here. If you do not like exceptions period or Python's use of relatively few exception types for many occasions, I really don't think we can start the discussion at the level of str.find. If I did manage to have an IndexError propagate through to my SomeDuckType.index method when it shouldn't have the meaning I ascribe it, then this is a bug in my implementation of SomeDuckType. This bug would be very unfortunate because when a user tries to use my code right--catching the IndexError--they will completely squash the offending exception and the source of the bug will be unclear. Unfortunately, str.find is highly prone to such bugs as I've discussed since -1 is a valid index for the string.
Mike

This is an absolute flight of fancy, and I'm sure it's already been rejected in the past (a quick search says Aug. 2009, http://mail.python.org/pipermail/python-ideas/2009-August/thread.html#5576 ), but what about some kind of try/except expression? i = s.index(substr) except ValueError is None I guess my main problem with that is the color of the bikeshed: it's hard to get a good idiomatic way of spelling the except expression. You could use a colon, as proposed in that thread, but it seems to me a colon indicates a new line follows. I would be for making this as simple as possible. No "as", no non-implicit "else", and no nesting of exception types. If you want something fancy, use a statement.

Georg Brandl wrote:
That's because the *value* passed as argument to .index() isn't found. Which error class to use often depends on your view point, so in some cases it may seem natural to you, in others you have a different POV, and it feels wrong :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 16 2011)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sat, Jul 16, 2011 at 7:11 PM, M.-A. Lemburg <mal@egenix.com> wrote:
Yeah, but trying to catch IndexError instead of ValueError is a pretty easy (and understandable) mistake to make, given the correspondence in names and logical train of thought "index() tells me the index where a substring can be found, IndexError is used to denote that a given index doesn't exist, so if the requested index doesn't exist, then this function will throw IndexError". Easily detected by testing, but still unintuitive and annoying. If IndexError had instead been called IndexNotFound, then there would never have been the slightest question as to which exception should have been thrown. Too late to change it now, though. We just all have to learn that, from the str.index point of view, failing to find a substring means there's something *wrong* with one (or both) of the passed in strings rather than the less opinionated "the index you have requested doesn't actually exist". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/15/2011 6:50 PM, Cameron Simpson wrote:
Not finding == failure to find == failure to match an implied re. The question is, how should failure to find be indicated? Out of band with a exception? In band with a special value? (I am here ignoring the design bug that the 'special value' returned by str.find is not really special in Python but is a legal and common index and hence bug-bait.) Python usually chooses one method or the other. The index/find pair is very exceptional in providing both. Why? What is so important or exceptional about this particular function? 'Find the index of a substring in a string' is not the most common operation, at least not for most people. To me, the special-pleading arguments given for find would apply to a hundred other functions. For instance, should we add an alternate constructor for int, say int.nix, that would return None instead of ValueError for 'string not consisting of 1 or more base x digits'. [ 'Nix' here means 'nix, you cannot do that, you get nothing in return'.] As with index/find, the input string either does or does not match a particular re. If we have two ways to indicate 'not match' for one function, why not all others? <Answer 1> Why not? Because we do not *need* the near duplication because mis-formed inputs to int are handled with try--except because that is the way it is done. The duplication of index/find is a historical aberration with no particular justification other than history. Leave it at that. If str.find did not exist, a proposal to add it would be less welcome than int.nix. <Answer 2> Indeed, choice of failure indicator is good, so lets do it right and have it everywhere. Define nix generically as def nix(self, *args, **kwds): try: return self(*args, **kwds) except: return None Make this a builtin and add it as a class or instance method, as appropriate, to appropriate built-in classes. (I am thinking, for instance, that class method for int and instance method for type(lambda:0) should work. See test below.) By exposing it, users could use it too eiher directly or wrapped with classmethod(). Or call the above _nix and define decorators. Then str.find would eventually be deprecated in favor of str.index.nix. A preliminary test: class C(): def __init__(self, f): self.f = f def __call__(self, *args): return self.f(*args) def nix(self, *args, **kwds): try: return self(*args, **kwds) except: return None myint = C(int) print(myint('1'), myint.nix('1'), myint.nix(''), myint.nix('a')) try: myint('') except: print('E caught')
1 1 None None E caught
-- Terry Jan Reedy

On 7/16/2011 2:46 PM, Terry Reedy wrote:
Indeed, negative values such as -1 are standard error/failure return codes for functions that normally return nonnegative ints and that written in statically typed languages without catchable exceptions. In C, for instance, EOF is defined as an implementation-defined negative int and I am sure -1 is used by some. -- Terry Jan Reedy

On 16Jul2011 22:21, Masklinn <masklinn@masklinn.net> wrote: | On 2011-07-16, at 21:52 , Terry Reedy wrote: | > On 7/16/2011 2:46 PM, Terry Reedy wrote: | >> On 7/15/2011 6:50 PM, Cameron Simpson wrote: | >>> str.find does not have a failure mode, it has string found and string | >>> not found. | >> | >> Not finding == failure to find == failure to match an implied re. | > Indeed, negative values such as -1 are standard error/failure | > return codes for functions that normally return nonnegative ints and | > that written in statically typed languages without catchable exceptions | | … or type systems worth using. And that's for those which are | 0-indexed of course, especially for finding sub-sequences. | | And interestingly, the function corresponding to `str.find` in libc | returns `NULL` in case of failure, not −1 (its return value is a | pointer to the first occurrence of the needle in the haysack). Though if you're thinking NULL is equivalent to None (which is often is conceptually), let's remember that in C NULL is just a pointer value; it is a sentinal, but not a different type. So NULL here is in some ways akin to -1 in Python's find return. You still need to test for it; not all platforms will (for example) segfault if NULL is dereferenced. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ There's no need to worry about death, it will not happen in your lifetime. - Raymond Smullyan

Mike Graham wrote:
str.find (and bytes.find) is worse than the alternatives in every way.
I disagree.
Additionally, the existence of str.find violates the there's-one-way-to-do-it principle.
The principle is "One *Obvious* Way To Do It" not *Only* One Way. Sometimes which one is 'obvious' is only clear from reading the documentation or learning the language in some other way -- but that doesn't mean it always the _better_ way. I, for one, have zero interest in losing good functionality because somebody else is misusing a feature. Exceptions are not /always/ the best way to do something. ~Ethan~

Mike Graham wrote:
I disagree.
Just because some people (allegedly) misuse str.find is not a reason to remove it. People misuse all sorts of things. I don't believe that it is valid to compare str.find to str.__contains__ since they do different things for different purposes. Using str.find instead of "in" is not misuse if you actually need an index. Better to do a single walk of the source string: p = s.find(sub) if p >= 0: # do something else: ... than wastefully do two: if sub in s: p = s.index(sub) # do something else: ... Whatever efficiency you might gain in the "substring not found" case, you lose in the "found case". You should only use "sub in s" when you don't care about *where* the substring is, only whether or not it is there. Strings are not dicts, and searching is not necessarily fast. If I'm searching the string twice, I'm doing it wrong. Since str.__contains__ is not a valid replacement for str.find, the only question is, should str.find be deprecated in favour of str.index? I say no. str.find is just too useful and neat, compared to catching an exception, to throw out. And it can be considerably faster. For long strings, the time taken for an unsuccessful search may be dominated by the time to traverse the string, and consequently the two alternatives are pretty close to the same speed:
Catching the exception is only 6% slower than testing for -1. Not much difference, and we probably shouldn't care one way or the other. However, for short strings, the time taken may be dominated by the cost of catching the exception, and so str.find may be significantly faster:
s.index here is nearly three times slower than s.find. (And of course, if the substring is present, index and find should be pretty much identical in speed.)
str.index is better for finding indices in that it supports an idiomatic exception-based API rather than a return-code API.
Being idiomatic is not better merely because it is idiomatic. Rather, what's better becomes idiomatic, rather than the other way around, because people re-use code examples that work well. I expect that in practice str.find is used rather more frequently than str.index, which suggests that at least when it comes to string searching, find is the idiomatic API.
"Every" usage? I don't think so. Another common and valid usage is this pattern: index = s.find(sub) if index >= 0: # do something Written with exception handling it becomes significantly longer, trickier and less obvious for beginners: try: index = s.index(sub) except ValueError: pass else: # do something Note especially that this takes the least interesting case, the "do nothing if not found", and promotes it ahead of the interesting case "do something if found". Now that's an anti-pattern! (Albeit a mild one.) And of course the try...except example is subject to its own conceptual failures. Both of these are subtly, or not-so-subtly, wrong: try: index = s.index(sub) # do something except ValueError: pass try: index = s.index(sub) except ValueError: pass # do something
which is an antipattern in Python.
Why do you think it is an anti-pattern? I don't consider it an anti-pattern. I often wish that lists also had a find method that returned a sentinel instead of raising an exception. (Although I'd probably use None, as the re module does, rather than -1.)
Yes, that's a good argument against the use of -1 for "not found". None would have been better.
Additionally, the existence of str.find violates the there's-one-way-to-do-it principle.
You may be confusing Python with some other language, because there is no such principle in Python. Perhaps you are mistaking it for the Zen, There should be one-- and preferably only one --obvious way to do it. which is a statement requiring the existence of an obvious way, not a prohibition against there being multiple non-obvious ways. In any case, it's far from clear to me that str.index is that obvious way. But then again, I'm not Dutch *wink* -- Steven

On Fri, Jul 15, 2011 at 11:57 PM, Mike Graham <mikegraham@gmail.com> wrote:
As others have noted, the typical usage pattern of: idx = s.find(sub) if idx >= 0: # do something that needs idx is nice and clean and significantly faster than searching the string twice. Universally discouraging find() in favour of index() is questionable, as the above is significantly cleaner than the index based alternative. So, I have a different suggestion (that actually paves the way for the eventual deprecation of find()): Update str.index() to accept an optional sentinel value. If the sentinel argument is not supplied, then a missing substring raises ValueError as it does now. If it is supplied, then a missing substring returns the sentinel value instead of throwing an exception. The above idiom could then be expressed cleanly as: idx = s.index(sub, missing=None) if idx is not None: # do something that needs idx However, this seemingly simple suggestion is complicated by the fact that string methods do not currently accept keyword arguments and index() already accepts two optional positional arguments (for substring searching). Perhaps the more general solution of try/except/value expressions is worth reconsidering. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jul 17, 2011 at 1:35 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Shouldn't need a full PEP, but will likely need at least some discussion on python-dev and some pre- and post-patch microbenchmarks to assess the impact on the speed of string operations (since passing and parsing keyword arguments for C functions *is* slower than only using positional arguments). Compared to making strings Unicode by default, though, it's a pretty minor change :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 15, 2011, at 6:57 AM, Mike Graham wrote:
Unless an API is flat out broken, deprecation is almost always a bad idea. This API has been around for a very long time, so deprecating it will break lots of people's code for almost zero benefit: http://www.google.com/codesearch#search/&q=%5C.find%5C(%20lang:%5Epython$&type=cs Raymond

Raymond Hettinger wrote:
How ironic that the fist hit seems to display the problem Mike is concerned with: position = min(position, len(self.contents)) if hasattr(newChild, 'parent') and newChild.parent != None: # We're 'inserting' an element that's already one # of this object's children. if newChild.parent == self: index = self.find(newChild) if index and index < position: # Furthermore we're moving it further down the # list of this object's children. That means that # when we extract this element, our target index # will jump down one. position = position - 1 I haven't read all the surrounding code to know if this will ever fail, but the whole 'index = ... .find(...); if index and ...' certainly doesn't lend confidence. After all, if you *know* newChild is in self, why not use .index()? ~Ethan~

On Sat, Jul 16, 2011 at 11:46 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Note that this particular code is using BeautifulSoup.PageElement.find, not str.find. There are, however, in the first few pages similar snippets which use the result of str.find without a check for the special case. Mike

Ethan Furman wrote:
Perhaps because the name "find" tells you exactly what the method does, while the name "index" is ambiguous. Does it mean "what is at this index?" or "what index is this at?". I've occasionally seen people mistakingly write mylist.index(i) instead of mylist[i]. Including an experienced Python coder who did know better. In that case, it was just a thinko (like a typo, only in the brain *wink*), but in my opinion, the name "index" is not a good name. Since find and index are equally efficient when the substring is present, there's no speed advantage to choosing one over the other if you know that the substring is present. In my opinion str.find beats str.index for readability so comprehensively that there is no contest -- I would *always* use find if available. In my wishlist for Python 4000 I have: * list.index renamed to list.find * str.find and list.find return None if the argument is not found -- Steven

On Sat, Jul 16, 2011 at 11:26 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
I agree that breaking people's code is a bad thing and have not suggested removing str.find. This removal would require a transition like that from Python 2.x to Python 3.x, a move that is not planned and I personally do not ever expect. I appreciate your linking this search, which does indeed does show that str.find is in wide use. However, looking at the first give pages of results, this use seems largely unfortunate—literally the majority of the times str.find is used, I would have used "substring in s" or "s.startswith(substring)". I also see code like "pos = s.find(" ("); if pos + len(" (...)") > self._maxWidth:" which makes me very uncomfortable and which I would have to read in detail to figure out what's happening confidently if I was the maintaining the code. Thanks for the reply, Mike

On Jul 16, 2011, at 10:39 AM, Mike Graham wrote:
I agree that breaking people's code is a bad thing and have not suggested removing str.find.
Deprecation is a step towards removal and it always causes a certain amount of pain. ISTM, the time for any move like this would have been the jump from Python 2 to Python 3 where significant breakage was expected and where transition tools were developed.
I am largely unsympathetic to arguments that are roughly equivalent to "I don't like the way other people write programs". Something akin to the str.find() API has been present in many, many languages for a very long time. For the most part, people seem to be able to use it reasonably well. I find that beginning Python students never seem to have a problem with it. Also keep it mind that startswith() and endswith() were relatively recent additions to Python, so it is no surprise that lots of code uses find() instead of startswith(). Raymond

On Sat, Jul 16, 2011 at 3:07 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Deprecation is a step towards removal and it always causes a certain amount of pain.
ISTM, the time for any move like this would have been the jump from Python 2 to Python 3 where significant breakage was expected and where transition tools were developed.
Because this would break backwards compatibility unnecessarily, I think anyone involved can agree that actual removal could not take place until a special jump akin to the Python 2->Python 3 jump. (It would surprise me if such a jump ever actually took place.)
Reviewing five pages of results, over 2/3 of the uses of str.find could be replaced by str.__contains__. Using sub in s instead of s.find(sub) != -1 is *already* the advice in the official Python documentation, so I do not believe I am making especially personal judgments about the style of the code.
I'm glad your experience with learners here has been more consistent than mine.
Mike

On Jul 16, 2011, at 12:26 PM, Mike Graham wrote:
Reviewing five pages of results, over 2/3 of the uses of str.find could be replaced by str.__contains__.
I think the first hit was from BeautifulSoup which predates the introduction of __contains__ and still runs on both old and new versions of Python. It may be hard to believe, but in the old days (not really so long ago), we didn't have __contains__ or startswith() and yet Python became popular anyway ;-)
I know about that advice. I believe I'm the one who added it ;-) (as well as many other tips in the code modernization PEP). Raymond

I just remembered one other thought on the subject. Usually, when Python introduces a method such as .index() that raises an exception for the not-found case, there are immediate requests for variants that don't raise exceptions: dict.pop(key, default) dict.get(key, defalut) next(iterable, default) getattr(obj, attr, default) re.match() --> None or matchobject People seem to hate wrapping try/except around simple calls. Expect those people to be agitated if you take away str.find(). Raymond

On Sat, Jul 16, 2011 at 4:37 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Then it seems you're guiltier than I of saying what way other people should write their programs. =) On Sat, Jul 16, 2011 at 4:52 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Did you catch Nick's suggestion to enhance str.index to have the same basic API as dict.get/getattr/etc.? I think this might be useful overall by providing a way to get a useful default, which is very often None (as a value in a slice). Mike

On Jul 16, 2011, at 2:04 PM, Mike Graham wrote:
Did you catch Nick's suggestion to enhance str.index to have the same basic API as dict.get/getattr/etc.?
Sorry, but I thought idea that was way off base. Things like dict.get and getattr are about returning values, so it is possible to provide a meaningful default. In the case of string.find, there is no meaningful default position in the string. So, a default would simply be a way to turn the -1 value into some other object which you would still need to test. We don't need to make the API worse by expanding it just for the sake of change. The str.find() method isn't broken or useless. If you want people to change the way they write code, it would be better to do it through education (blog posts, pylint, etc) rather than by breaking a venerable API. Raymond

On Sat, Jul 16, 2011 at 6:05 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Slicing was brought up as a use-case where you can use a default without checking. mystring[:mystring.index('#', None)], for example, could conceivably be used somewhere to strip comments from (some sort of) code, for example. It does have other benefits too. It makes the return value on failure explicit, which would help remind people to check, or be more immediately aware when reading code. And it does have nice parallels to those other methods. Devin

On Jul 16, 2011, at 3:15 PM, Devin Jeanpierre wrote:
Slicing was brought up as a use-case where you can use a default without checking.
This thread has lost contact with reality. It started with the reasonable observation that many uses of str.find could be replaced with a simple test using the in-operator. Now, the thread is venturing into the typical python-ideas world of making-up random use cases and APIs. Something like str.find() is in many languages and it is definitely not the norm for them to have found a need to both be able to return -1 or to supply a default value. As a Python teacher, speaker, and consultant, I have the opportunity to see and review the code repositories for many companies. I'm pretty sure that I've never seen a utils module with the likes of: def myfind(fullstring, substring, default=None): i = fullstring.find(substring) if i == -1 and default is not None: return default return i When I start seeing people routinely using a helper function like this, I'll start to believe that a str.find default value isn't silly. Mike's initial post was well grounded in observations about code that could be improved by using "in" or str.index() instead of str.find(). Though I disagreed with the recommendation to deprecate, it would be even worse to exacerbate the usability issues by making the method signature even more complex (with a new optional argument and a new signature variant for the return value). That isn't progress. It's aspiring cruft that makes the language harder to learn and remember. Raymond

On 16Jul2011 15:52, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: | On Jul 16, 2011, at 3:15 PM, Devin Jeanpierre wrote: | > Slicing was brought up as a use-case where you can use a default | > without checking. | | This thread has lost contact with reality. It started with the | reasonable observation that many uses of str.find could | be replaced with a simple test using the in-operator. To be fair here, Mike's OP also mentioned that -1 is easy to misuse if not checked because it it still numeric. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Of course, I realize that the rain in the UK is much wetter than the rain we get here... - Eric Gunnerson <gunnerso@halcyon.com>

Raymond Hettinger wrote:
Nick's proposal was to enhance str.index(), not string.find(); Having str.index() accept a value to return on failure instead of raising an exception means it could do both jobs, and would also make it much less likely to wrongly use the failure return value of -1 from str.find() which is, unfortunately, a legitimate index value. ~Ethan~

On Sun, Jul 17, 2011 at 11:40 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Indeed, the problem as I see it is that our general idiom for functions and methods that raise 'Not Found' exceptions is to accept an optional parameter that specifies a value to return in the Not Found case. For historical reasons, we currently break that idiom for index() methods: instead of supplying an extra parameter to str.index, one instead switches to a completely different method (.find()) with no control over the sentinel value returned (it's always -1). For other sequences (e.g. list), there's no find equivalent, so you *have* to write the exception handling out explicitly. My proposal is to update the signature of index() (for all sequences, including the ABC) to follow the standard 'Not Found' idiom by accepting a 'missing' parameter that is returned for those cases where ValueError would otherwise be raised. Code that uses str.find would continue to work, but the recommended alternative would be obj.index(x, missing=None) (or appropriate default value). I would advise against any actual deprecation of str,find (cf. the deliberate lack of optparse deprecation). It's unfortunate that backwards compatibility means we can't use the more descriptive name, but that's life. However, I already have too much on my plate to push this forward for Python 3.3. I'm able to offer advice if someone would like to try their hand at writing a PEP, though. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 17, 2011, at 12:15 AM, Nick Coghlan wrote:
There's a difference between methods that return looked-up values (where a default might make sense) versus a method that returns an index (where it usually makes no sense at all).
If someone takes this out of python-ideas land and into a serious PEP, they should be prepared to answer a number of tough questions: * Is this actually necessary? Is there something you currently can't code? If not, then it adds API complexity without adding any new capabilities. There is a high threshold for expanding the string API -- this would affect everyone learning python, every book written, every lint tool, every class seeking to be string-like, etc. So, it would need be a substantive improvement to be accepted. * Take a look at what other languages do. Practically every general purpose language has an API for doing substring searches. Since we're not blazing new territory here, there needs to be a good precedent for this change (no shooting from the hip when the problem has already been well solved many times over). * Use Google's code search to identify examples of real world code that would better with the new API. If the only use case is creating a new slicing one-liner, that likely is too rare and arcane to warrant a change. * Consider the effects of adding a second-way-to-do-it. Will it add to the learning curve, cause debates about the best way in a given situation, add more PEP 8 entries and pylint checks? Is it worth introducing version incompatibilities (i.e. runs on 3.3 but not earlier), etc. * What should the default value be? Is there any non-numerical result that ever makes sense; otherwise, you're just making a alias for the -1 currently returned by str.find(). If the default is some value that evaluates to False, will that create a common error where an if-test fails to disambiguate the default value from a substring found at position zero. If the new API is ambiguous or confusing in *any* way, then it will be a step backwards and make Python worse rather than better. * See if you can find examples where people have already found the need to write a helper function such as: def index_default(s, sub, default): try: return s.index(sub) except ValueError: return default If you find code like that in the wild, it may be an indication that people want this. If you don't, it may indicate otherwise. * Good API design requires some thinking about function/method signatures. Would making this a keyword-only argument soive the positional arguments problem? Since str.index() already takes arguments for the "start" and "end" index, is the full signature readable without keywords: mystr.index(possible_substr, 0, -1, default_value) Also look at the signature for the return value. Currently, it always returns a number, but if it can return a number or anything else, then all client code must be prepared to handle the alternatives with clean looking code that is self-evidently correct. * Perhaps talk to some people who write python code for a living to determine if they've ever needed this or whether it would end-up as cruft. (In my case, the answer is that I've not needed or wanted this in a over a decade of heavy Python use). Hopefully, this short and incomplete list will provide a good basis for thinking about whether the proposal is a good idea. Defending a PEP is no fun at all, so put in all your deep thinking up front. Cheers, Raymond

On 2011-07-17, at 10:09 , Raymond Hettinger wrote:
SML even returns the length of the string. See a listing at http://en.wikipedia.org/wiki/Comparison_of_programming_languages_(string_fun... The most common behavior on the page does seem to be returning a numerical sentinel. On the other hand, I'm not sure how many of these languages return a sentinel value which is also a valid index.

On Jul 17, 2011, at 2:13 AM, Masklinn wrote:
There is a fundamental difference between content (values in the list) and the list position. There are meaningful search defaults for the former but not the latter. It's important that to grasp this distinction before going further.
My reading is that not a single one of these entries has a signature with a user specifiable default value of arbitrary type. IOW, there is nothing resembling proposal on the table. AFAICT, all of them are equivalent to either our current str.find() or str.index(). Raymond

Masklinn wrote:
No. In context, Raymond is not talking about values as arbitrary objects. He is talking specifically about values of a collection. E.g. given: mylist = [23, 42, 100] the values Raymond is talking about are 23, 42 and 100, *not* 0, 1, 2 (the indexes of the list) or 3 (the length of the list) or 165 (the sum of the list) or any other arbitrary value.
I can't really see that they do "everything and their reverse". There are two basic strategies: return an out-of-bound value, and raise an exception, both of which Python already does. Out-of-bound values are usually one smaller than the lowest valid index (0 or -1) or one higher than the highest valid index (length of the string, or more greater than the length of the string). A couple of languages return False, which is inappropriate for Python on account of False equaling 0. Some return a dedicated "Not Found" special value, but Python doesn't go in for a proliferation of special constants. A couple of languages, including Ruby, return the equivalent of Python's None. Notably missing is anything like the ability for the caller to specify what index to return if the sub-string is missing. Ask yourself, can you imagine needing mydict.get(key, 1) or mydict.get(key, set())? I expect that you can easily think of reasons why this would be useful. The usefulness of being able to set the return value of failed lookups like dicts is obvious. I wish lists also had a similar get method, and I bet that everybody reading this, even if they disagree that it should be built-in, can see the value of it as a utility function. But can you think of a realistic scenario where you might want to call mystring.find(substr, missing=1)? Why would you want "substring not present" and "substring found at index 1" to both return the same thing? How about mystring.find(substr, missing=set())? If you can't imagine a realistic scenario where you would want such a feature, then you probably don't need this proposed feature.
The first table on the page says that four languages accept negative indexes: Python, Ruby, Perl and Lua. Perl and Python return -1 on not found; Ruby and Lua return nil. -- Steven

On Sun, Jul 17, 2011 at 8:56 PM, Steven D'Aprano <steve@pearwood.info> wrote:
In a language that accepts negative indices, -1 is not out of bounds, and hence is an objectively bad value to return. If str.find() returned None for missing substrings, we wouldn't be having this discussion. However, backwards compatibility requirements mean that option is not available to us. Is this a language wart where the correct answer is to tell everyone new to the language that complains about the current behaviour to "suck it up and learn to live with it"? Maybe. But claiming that str.find() is a *good* API in a language that accepts negative indices would be flat out wrong. Good *enough*, maybe, but not good in an absolute sense. As I see it, there are a few options. 1. Do nothing. Quite a reasonable option. str.find() is flawed and str.index() can be annoying to use, but fixing this may not be worth the hassle. However, this is not the same as claiming the str.find() behaviour is a good API - it's just acknowledging the wart and deciding not to address it. 2. Add an alternate behaviour to str.index() that allows the exception to optionally be converted into a sentinel value. Inspired by getattr, getitem, dict.get, next, etc. Made messy by the need for the sentinel to go after the existing arguments and the fact that the number of sane sentinel values is extremely limited (aside from None, the only remotely useful possibilities I can think of are 0 and len(seq), and even those only hypothetically). This solution requires changing a builtin method API, as well as adding keyword argument support to CPython string methods. Also raises the question of whether or not the Sequence ABC (and sequences in the stdlib) should be adjusted accordingly. 3. Search for a more general solution that simplifies the following common try/except pattern into a single expression (or simple statement): try: val = f() except RelevantException: val = default # further operations using val This has been attempted before, but never successfully as it isn't *that* common (so ugly solutions are never going to be accepted as better than the status quo) and it's extremely difficult to come up with a syntax that neatly captures the 3 necessary elements and can still be handled easily by the parser. Since I have no reason to believe 3 will get any further than it has in that past, making an explicit decision between 1 and 2 is the reason I'd like to see a PEP. PEPs aren't just about getting new features into the language - the Rejected ones are also about documenting the reasons we have chosen *not* to do certain things (and the Deferred ones point out that some things are just hard to do in a way that provides a net benefit to the language). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Raymond Hettinger wrote:
We are not talking about a default value to return -- the default will still be the behavior of raising a ValueError if the substring is not found. Consider the proposed signature: _sentinal = object() class str(): def index(substring, start, end, missing=_sentinal): # looks for string .... # string not found -- now what? if missing is _sentinal: raise ValueError('...') else: return missing The addition is that *if* the caller specifies an object for missing, return that value, *otherwise* raise ValueError just like we do now.
Hmmm -- okay, perhaps we are... let me say, then, that I agree having a default return is not the way to go; this would break everything that expects .index() to exception out if the substring is not found -- in other words, everything that uses .index(). My take on the idea is to have the new 'missing' argument be optional, and if not specified then current behavior is unchanged, but if specified then that value is returned instead.
Trying to be string-like at the moment is such a PITA I really don't see this tiny extra bit as a serious burden. Consider this nice simple code: class MyStr(str): def find(substr, start=None, end=None): # whatever extra I want to do before passing off to str # now pass off to str return str.find(substr, start, end) Too bad it doesn't work: --> test = MyStr('this is a test') --> test.find('is') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in find TypeError: slice indices must be integers or None or have an __index__ method (Yes, this was fixed in 2.6, and as soon as I'm willing to drop support for earlier versions I can remove the following boilerplate: start = start or 0 end = end or len(self) and yes, I wouldn't be able to use the new .index(..., missing=_whatever) for a while, but that doesn't mean we should stop improving the language.)
Why not? 'Well solved' does not mean there is no room for improvement. And going through the whole PEP process does not feel like 'shooting from the hip'.
You mean like 'runs on 2.6+ but not earlier'?
* What should the default value be?
There should be no default value, in my opinion.
The most effective argument by far, IMO, both for not having a default value, and for being very careful about what the caller chooses to use for the missing argument. I think a bomb would be appropriate here: class Bomb(): 'singleton object: blows up on any usage' def __bool__(self): raise OopsError('yell at the programmer!") etc then in usage it's a check for object identity, anything else reminds somebody they forgot to do something.
Many good points -- thank you for taking the time. ~Ethan~

Raymond Hettinger wrote:
Indeed, because for simple things, you don't want to clutter up your code with lots of nested try-excepts. It also has to do with performance: exceptions should only be used for exceptional situations. Not finding a sub-string in some line read from a log file is not an exceptional situation. In fact, it's most likely the common case. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 18 2011)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Mike Graham wrote:
I think that's a disingenuous thing to say. You started this thread with an explicit call to deprecate str.find -- see your chosen subject line, and your first paragraph in this thread states: "It should be explicitly deprecated ..." What is the point of deprecating something if you don't intend to eventually remove it?
A lot of very old code predates startswith and endswith. They only appeared in 2.0. Surprisingly, as late as 2.2, we were limited to testing for a single character: [steve@sylar src]$ python2.2 Python 2.2.3 (#1, Aug 12 2010, 01:08:27) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Some of the code on that first page supports Python 2.1.
Whereas "pos = s.index(" ("); if pos + len(" (...)") > self._maxWidth:" is the height of readability, yes? People can write bad code no matter what tools you give them. -- Steven

On Sat, 2011-07-16 at 08:26 -0700, Raymond Hettinger wrote:
Although a quick lock over the found snippets tells, that apparently many uses of find are indeed "ugly" code that could be improved by use of index or __contains__ Since a DeprecationWarning seems rather intrusive, what would be a less intrusive way to cause code-enhancement in such places? -- Ronny
participants (17)
-
Antoine Pitrou
-
Bruce Leban
-
Cameron Simpson
-
Carl Johnson
-
Devin Jeanpierre
-
Dirkjan Ochtman
-
Ethan Furman
-
Georg Brandl
-
Guido van Rossum
-
M.-A. Lemburg
-
Masklinn
-
Mike Graham
-
Nick Coghlan
-
Raymond Hettinger
-
Ronny Pfannschmidt
-
Steven D'Aprano
-
Terry Reedy