[Python-3000] Droping find/rfind?

Fri Aug 25 21:13:31 CEST 2006

On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> >>Twisted's core loop uses string views to avoid unnecessary copying.  This
> >>has proven to be a real-world speedup.  This isn't a synthetic benchmark
> >>or a micro-optimization.
> >
> >OK, that's the kind of data I was hoping for; if this was mentioned
> >before I apologize. Did they implement this in C or in Python? Can you
> >point us to the docs for their API?
>
> One instance of this is an implementation detail which doesn't impact any application-level APIs:
>
> http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88

You are referring to the two calls to buffer(), right? It seems a
pretty rare use case (though an important one). I wonder how often
offset != 0 in practice. I'd like the new 3.0 I/O library provide
better support for writing part of a buffer, e.g. by adding an
optional offset parameter to write().

> Another instance of this is implemented in C++:
>
> http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion
>
> but doesn't interact a lot with Python code.  The C++ API uses char* with a length (a natural way to implement string views in C/C++).  The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods.

This doesn't seem a particularly strong use case (but I can't say I
understand the code or how it's used).

> >>I don't understand the resistance.  Is it really so earth-shatteringly
> >>surprising that not copying memory unnecessarily is faster than copying
> >>memory unnecessarily?
> >
> >It depends on how much bookkeeping is needed to properly free the
> >underlying buffer when it is no longer referenced, and whether the
> >application repeatedly takes short long-lived slices of long otherwise
> >short-lived buffers. Unless you have a heuristic for deciding to copy
> >at some point, you may waste a lot of space.
>
> Certainly.  The first link above includes an example of such a heuristic.

Because the app is in control it is easy to avoid the worst-case
behvior of the heuristoc.

> >>If the goal is to avoid speeding up Python programs because views are too
> >>complex or unpythonic or whatever, fine.  But there isn't really any
> >>question as to whether or not this is a real optimization.
> >
> >There are many ways to implement views. It has often been proposed to
> >make views an automatic feature of the basic string object. There the
> >optimization in one case has to be weighed against the pessimization
> >in another case (like the bookkeeping overhead everywhere and the
> >worst-case scenario I mentioned above).
>
> I'm happy to see things progress one step at a time.  Having them _at
> all_ (buffer) was a good place to start.

But buffer() is on the kick-list for Py3k right now. Perhaps the new
bytes object will make it possible to write the first example above
differently; bytes will be mutable which changes everything.

> A view which has string methods
> is a nice incremental improvement.  Maybe somewhere down the line there
> can be a single type which magically knows how to behave optimally for all
> programs, but I'm not asking for that yet. ;)

I still expect that a view with string methods will find more abuse
than legitimate use.

> >If views have to be explicitly
> >requested that may not be a problem because the app author will
> >(hopefully) understand the issues. But even if it was just a standard
> >library module, I would worry that many inexperienced programmers
> >would complicate their code by using the string views module without
> >real benefits. Sort of the way some folks have knee-jerk habits to
> >write
> >
> >  def foo(x, None=None):
> >
> >if they use None anywhere in the body of the function. This should be
> >done only as a last resort when real-life measurements have shown that
> >foo() is a performance show-stopper.
>
> I don't think we see people overusing buffer() in ways which damage
> readability now, and buffer is even a builtin.

But it has been riddled by problems in the past so most people know to
steer clear of it.

> Tossing something off
> into a module somewhere shouldn't really be a problem.  To most people
> who don't actually know what they're doing, the idea to optimize code
> by reducing memory copying usually just doesn't come up.

That final remark is a matter of opinion. I've seen too much code that
mindlessly copied idioms that were supposed to magically speed up
certain things to believe it. Often, people who don't know what they
are doing are more worried about speed than people who do, and they
copy all the wrong examples... :-(

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)