[Python-3000] Droping find/rfind?

Fri Aug 25 20:49:02 CEST 2006

On Fri, 25 Aug 2006 10:53:15 -0700, Guido van Rossum <guido at python.org> wrote:
>On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>> >For the record, I think this is a major case of YAGNI. You appear way
>> >to obsessed with performance of some microscopic aspect of the
>> >language. Please stop firing random proposals until you actually have
>> >working code and proof that it matters. Speeding up microbenchmarks is
>> >irrelevant.
>>
>>Twisted's core loop uses string views to avoid unnecessary copying.  This
>>has proven to be a real-world speedup.  This isn't a synthetic benchmark
>>or a micro-optimization.
>
>OK, that's the kind of data I was hoping for; if this was mentioned
>before I apologize. Did they implement this in C or in Python? Can you
>point us to the docs for their API?

One instance of this is an implementation detail which doesn't impact any application-level APIs:

http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88

Another instance of this is implemented in C++:

http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion

but doesn't interact a lot with Python code.  The C++ API uses char* with a length (a natural way to implement string views in C/C++).  The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods.

>>I don't understand the resistance.  Is it really so earth-shatteringly
>>surprising that not copying memory unnecessarily is faster than copying
>>memory unnecessarily?
>
>It depends on how much bookkeeping is needed to properly free the
>underlying buffer when it is no longer referenced, and whether the
>application repeatedly takes short long-lived slices of long otherwise
>short-lived buffers. Unless you have a heuristic for deciding to copy
>at some point, you may waste a lot of space.

Certainly.  The first link above includes an example of such a heuristic.

>>If the goal is to avoid speeding up Python programs because views are too
>>complex or unpythonic or whatever, fine.  But there isn't really any
>>question as to whether or not this is a real optimization.
>
>There are many ways to implement views. It has often been proposed to
>make views an automatic feature of the basic string object. There the
>optimization in one case has to be weighed against the pessimization
>in another case (like the bookkeeping overhead everywhere and the
>worst-case scenario I mentioned above).

I'm happy to see things progress one step at a time.  Having them _at
all_ (buffer) was a good place to start.  A view which has string methods
is a nice incremental improvement.  Maybe somewhere down the line there
can be a single type which magically knows how to behave optimally for all
programs, but I'm not asking for that yet. ;)

>If views have to be explicitly
>requested that may not be a problem because the app author will
>(hopefully) understand the issues. But even if it was just a standard
>library module, I would worry that many inexperienced programmers
>would complicate their code by using the string views module without
>real benefits. Sort of the way some folks have knee-jerk habits to
>write
>
>  def foo(x, None=None):
>
>if they use None anywhere in the body of the function. This should be
>done only as a last resort when real-life measurements have shown that
>foo() is a performance show-stopper.
>

I don't think we see people overusing buffer() in ways which damage
readability now, and buffer is even a builtin.  Tossing something off
into a module somewhere shouldn't really be a problem.  To most people
who don't actually know what they're doing, the idea to optimize code
by reducing memory copying usually just doesn't come up.

Jean-Paul