Mailman 3 Pairwise summation - NumPy-Discussion - python.org

newer
timedelta64 remainder behavior...

Pairwise summation

older
Issue with setup_requires and 1.16...

Keith Goodman

Jan. 7, 2019

3:15 p.m.

Numpy uses pairwise summation along the fast axis if that axis contains no more than 8192 elements. How was 8192 chosen? Doubling to 16384 would result in a lot more function call overhead due to the recursion. Is it a speed issue? Memory? Or something else entirely?

Attachments:

attachment.htm (text/html — 557 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Sebastian Berg

January 2019

3:32 p.m.

On Mon, 2019-01-07 at 12:15 -0800, Keith Goodman wrote:

Numpy uses pairwise summation along the fast axis if that axis contains no more than 8192 elements. How was 8192 chosen?

It is simply a constant used throughout the ufunc machinery (and iteration) for cache friendliness. However, that iteration should not always chunk to 8192 elements, it should often just the whole array. And I do not think the inner loop has anything chunking itself, so given a contiguous fast axis and no casting, you likely already get a single outer iteration. In any case 8192 chosen to be small enough to be cache friendly and is exposed as `np.BUFSIZE`, you can actually the buffer that is being used with `numpy.setbufsize(size)`, although I can't say I ever tried it. Note that it has to fit also larger datatypes and multiple buffers. - Sebastian

Reply

Sign in to reply online Use email software

Sebastian Berg

January 2019

3:32 p.m.

On Mon, 2019-01-07 at 12:15 -0800, Keith Goodman wrote:

Numpy uses pairwise summation along the fast axis if that axis contains no more than 8192 elements. How was 8192 chosen?

It is simply a constant used throughout the ufunc machinery (and iteration) for cache friendliness. However, that iteration should not always chunk to 8192 elements, it should often just the whole array. And I do not think the inner loop has anything chunking itself, so given a contiguous fast axis and no casting, you likely already get a single outer iteration. In any case 8192 chosen to be small enough to be cache friendly and is exposed as `np.BUFSIZE`, you can actually the buffer that is being used with `numpy.setbufsize(size)`, although I can't say I ever tried it. Note that it has to fit also larger datatypes and multiple buffers. - Sebastian

Reply

Sign in to reply online Use email software

2258

Age (days ago)

2258

Last active (days ago)

Download

1 comments

2 participants

tags

participants (2)

Keith Goodman
Sebastian Berg