[Python-Dev] cpython (2.7): Fix comment blocks. Adjust blocksize to a power-of-two for better divmod

Victor Stinner victor.stinner at gmail.com
Mon Jun 24 12:39:31 CEST 2013


Hi Raymond,

Thank you for your long explanation, it is exactly what Antoine asked
for :-) I like micro-optimization even if I know that some other
developers only see such work as noise, not providing an real speed up
:-) So it was interesting to read your email!

I'm sorry that you was injured by the revert of your commits (even I
didn't do the revert myself, I agreed with Benjamin, and he's the
maintainer of Python 2.7). I hope that you understood why they were
rejected after reading this mail thread: the last Python 2.7 release
(which one exactly?) contains regressions, and changes on the 2.7
stable branch are sensitive. By experience, there is always someone in
the world relying to very specific implementation details. Cython,
PyPy CPyExt, ctypes, etc. are sensitive to such change. People should
not rely on such implementation details, but they do :-)

For this specific case, I "hope" that nobody relies on the exact BLOCK
structure (since it is a private structure). But it is just safer to
not change it that in a minor Python version. If your compare the risk
of introducing a regression versus the speedup, I guess that the risk
is higher.

As I wrote, I'm interested in micro-optimization, so please continue
your work! It is just safer to only modify the active development
branch (Python 3.4).

Victor


2013/6/24 Raymond Hettinger <raymond.hettinger at gmail.com>:
>
> On Jun 18, 2013, at 12:00 AM, Gregory P. Smith <greg at krypto.org> wrote:
>
> Why did you do this in the 2.7 branch?
>
> It hasn't been done in 3.3 or default
>
>
> I worked on the 2.7 branch first because that was the one I had loaded
> and the one where I did timings and code disassembly.  I intended to
> post it to 3.3 and 3.4 as well over the weekend.
>
> Ideally, it makes maintenance simpler for me if I keep the branches
> the same as possible.  I viewed the one-line struct transposition,
> comment correction, and one-line blocklen change as being somewhat
> innocuous non-algorithmic changes.  The struct change fixed an unintended
> cache miss for left links and the blocksize change causes the
> deque_index code to compile more efficiently (using a right-shift
> and bitwise-and and rather than a measurably more expensive
> division and modulo calculation).
>
> I truly wasn't expecting the Spanish Inquisition :-)
>
>
> On Jun 22, 2013, at 1:43 PM, Scott Dial <scott+python-dev at scottdial.com>
> wrote:
>
> changeset 84248:f1dc30a1be72 2.7
> Arrange structure to match the common access patterns.
>
>     1.1 --- a/Modules/_collectionsmodule.c
>     1.2 +++ b/Modules/_collectionsmodule.c
>     1.3 @@ -48,8 +48,8 @@
>     1.4
>     1.5  typedef struct BLOCK {
>     1.6      struct BLOCK *leftlink;
>     1.7 +    PyObject *data[BLOCKLEN];
>     1.8      struct BLOCK *rightlink;
>     1.9 -    PyObject *data[BLOCKLEN];
>    1.10  } block;
>    1.11
>    1.12  #define MAXFREEBLOCKS 10
>
> , which seems like a strange micro-optimization, at best.
>
>
> Yes, this is a micro-optimization.  In working on implementing
> deque slicing for 3.4, I restudied the block access patterns.
> On an appendleft(), popleft() or extendleft() operation, the left link is
> accessed immediately before or after the leftmost entry in the data block.
> The old struct arrangement can cause an unnecessary cache miss
> when jumping leftward.  This was something I overlooked when I
> originally wrote the code almost a decade ago.
>
> On Jun 23, 2013, at 11:38 AM, Benjamin Peterson <benjamin at python.org> wrote:
>
> I've backed this one out, too.
>
>
> Really, you reverted my one-line change within 24 hours of it being posted?
> I can't be on-line every day and sometimes it takes a little while to catch
> up
> with python email.
>
> On Jun 22, 2013, at 2:55 PM, Guido van Rossum <guido at python.org> wrote:
>
> Actually the data buffer is an array of pointers too, so with the
> original BLOCKLEN value of 62, sizeof(block) would be 64 times
> sizeof(PyObject *). In the Py3 version of the source there's even a
> comment explaining this right before the #define BLOCKLEN. Raymond,
> can you explain?
>
>
> I also tried to fix that comment so it would stop emphasizing the blocklen
> being a multiple of the cache line.   Also long as there is a reasonably
> long
> data block, it matters less whether the data block size is an exact multiple
> of the cache line length (62 or 64 words of data versus a typical 8 byte
> cache line).
>
> The data block size does matter elsewhere though.
> The benefit of the having the data block being a power-of-two
> is that the deque_index computation can use bits shifts
> rather division and modulo calculations.  The benefit
> of switching data block size from 62 to 64 was measurable
> (a few percent) and observable in the disassembly of the code.
>
> I experimented with one other ordering as well
> (putting the data block first and the links afterwards).
> That saved the scaled-index byte in the generated code
> but produced no measureable speed-up.
>
> In short, I believe the following should go in:
>
> * The comment fix. (Incorrectly suggesting that a 64-bit
>    Py_ssize_t would overflow).  The revised comment
>    is simpler, more general, and correct.
>
> * Putting the leftlink before the data block in the structure.
>
> The follow is up for debate:
>
> Changing the BLOCKLEN from 62 to 64 is debatable.
> It measureably improved deque_index without an
> observable negative effect on the other operations.
>
> Lastly, there was a change I just put in to Py 3.4 replacing
> the memcpy() with a simple loop and replacing the
> "deque->" references with local variables.  Besides
> giving a small speed-up, it made the code more clear
> and less at the mercy of various implementations
> of memcpy().
>
> Ideally, I would like 2.7 and 3.3 to replace their use of
> memcpy() as well, but the flavor of this thread suggests
> that is right out.
>
>
> Raymond
>
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>


More information about the Python-Dev mailing list