[Python-Dev] partition() (was: Remove str.find in 3.0?)

Tue Aug 30 19:06:55 CEST 2005

Pierre Barbier de Reuille <pierre.barbier at cirad.fr> wrote:
> Well, what it does is exactly what I tought, you can express most of the
> use-cases of partition with:
> 
> head, sep, tail = s.partition(sep)
> if not sep:
>   #do something when it does not work
> else:
>   #do something when it works
> 
> And I propose to replace it by :
> 
> try:
>   head, sep, tail = s.partition(sep)
>   # do something when it works
> except SeparatorError:
>   # do something when it does not work

No, you can't.  As Tim Peters pointed out, in order to be correct, you
need to use...

try:
    head, found, tail = s.partition(sep)
except ValueError:
    # do something when it can't find sep
else:
    # do something when it can find sep

By embedding the 'found' case inside the try/except clause as you offer,
you could be hiding another exception, which is incorrect.

> What I'm talking about is consistency. In most cases in Python, or at
> least AFAIU, error testing is avoided and exception launching is
> preferred mainly for efficiency reasons. So my question remains: why
> prefer for that specific method returning an "error" value (i.e. an
> empty separator) against an exception ?

It is known among those who tune their Python code that try/except is
relatively expensive when exceptions are raised, but not significantly
faster (if any) when they are not. I'll provide an updated set of
microbenchmarks...

>>> if 1:
...     x = 'h'
...     t = time.time()
...     for i in xrange(1000000):
...             _ = x.find('h')
...             if _ >= 0:
...                     pass
...             else:
...                     pass
...     print time.time()-t
...
0.84299993515
>>> if 1:
...     x = 'h'
...     t = time.time()
...     for i in xrange(1000000):
...             try:
...                     _ = x.index('h')
...             except ValueError:
...                     pass
...             else:
...                     pass
...     print time.time()-t
...
0.81299996376

BUT!
>>> if 1:
...     x = 'h'
...     t = time.time()
...     for i in xrange(1000000):
...             try:
...                     _ = x.index('i')
...             except ValueError:
...                     pass
...             else:
...                     pass
...     print time.time()-t
...
4.29700016975

We should subtract the time of the for loop, the method call overhead,
perhaps the integer object creation/fetch, and the assignment.
str.__len__() is pretty fast (really just a member check, which is at a
constant offset...), let us use that.

>>> if 1:
...     x = 'h'
...     t = time.time()
...     for i in xrange(1000000):
...             _ = x.__len__()
...     print time.time()-t
...
0.5

So, subtracting that .5 seconds from all the cases gives us...

0.343 seconds for .find's comparison
0.313 seconds for .index's exception handling when an exception is not
raised
3.797 seconds for .index's exception handling when an exception is
raised.

In the case of a string being found, .index is about 10% faster than
.find .  In the case of a string not being found, .index's exception
handlnig mechanics are over 11 times slower than .find's comparison.

Those numbers should speak for themselves.  In terms of the strings
being automatically chopped up vs. manually chopping them up with slices,
it is obvious which will be faster: C-level slicing.

I agree with Raymond that if you are going to poo-poo on str.partition()
not raising an exception, you should do some translations using the
correct structure that Tim Peters provided, and post them here on
python-dev as 'proof' that raising an exception in the cases provided is
better.

 - Josiah