[Tutor] s.insert(i, x) explanation in docs for Python 3.4 confusing to me

Cameron Simpson cs at zip.com.au
Sun Jan 17 18:27:00 EST 2016


On 17Jan2016 14:50, Alex Kleider <akleider at sonic.net> wrote:
>Again, a personal thank you.

No worries.

>More often than not, when answering one thing, you teach me about
>other things.  The 'thing' thing is only the latest.  Of course
>I knew that using a name bound to a collection would effect the
>contents of the collection but it would never have occurred to me
>to use it to advantage as you describe.

Note that it shouldn't be overdone. Pointless optimisation is also a source of 
bugs and obfuscation.

As I remarked, I'll do this for readability if needed or for performance where 
"things" (or whatever) is going to be heavily accessed (eg in a busy loop), 
such that getting-it-from-the-instance might significantly affect the 
throughput. For many things you may as well not waste your time with it - the 
direct expression of the problem is worth it for maintainability.

It is largely if I'm doing something low level (where interpreted or high level 
languages like Python lose because their data abstraction or cost-of-operation 
may exceed the teensy teensy task being performed.

As an example, I'm rewriting some code right now that scans bytes objects for 
boundaries in the data. It is inherently a sequential "get a byte, update some 
numbers and consult some tables, get the next byte ...". It is also heavily 
used in the storage phase of this application: all new data is scanned this 
way.

In C one might be going:

  for (char *cp=buffer; buf_len > 0; cp++, buflen--) {
    b = *cp;
    ... do stuff with "b" ...
  }

That will be efficiently compiled to machine code and run flat out at your 
CPU's native speed. The Python equivalent, which would look a bit like this:

  for offset in range(len(buffer)):
    b = bs[offset]
    ... do stuff with "b" ...

or possibly:

  for offset, b in enumerate(buffer):
    ... do stuff with "b" ...

will inherently be slower because in Python "bs", "b" and "offset" may be any 
type, so Python must do all this through object methods, and CPython (for 
example) compiles to opcodes for a logial machine, which are themselves 
interpreted by C code to decide what to do. You can see that the ratio of 
"implement the Python operations" to the core "do something with a byte" stuff 
can potentially be very large.

At some point I may be writing a C extension for these very busy parts of the 
code but for now I am putting some effort into making the pure Python 
implementation as efficient as I can while keeping it correct. This is an 
occasion when it is worth expending significant effort on minimising 
indirection (using "things" instead of "self.things", etc), because that 
indirection has a cost that _in this case_ will be large compared to the core 
operations.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list