By which I mean a memoryview that lets you change the start and end offsets of its view of the underlying object (not modifying the underlying object). Let's say I want to slice a long string s into trigrams and write them to disk (or something). for i in xrange(0, len(s) - 3): x = s[i:i + 3] myfile.write(x) At each step the slice copies the bytes of the string, even though all I do is write them to disk. I could avoid copying with memoryviews... for i in xrange(0, len(s) - 3): x = memoryview(s)[i:i + 3] myfile.write(x) ...but this is actually much slower (3x slower in some quick tests). I'm guessing it's because of all the object creation (while string slicing probably uses fast paths). Shouldn't I be able to do this? m = memoryview(s) for i in xrange(0, len(s) - 3): m.start = i m.end = i + 3 myfile.write(m) Cheers, Matt
On Thu, 26 Jul 2012 18:40:33 -0400 Matt Chaput <matt@whoosh.ca> wrote:
By which I mean a memoryview that lets you change the start and end offsets of its view of the underlying object (not modifying the underlying object).
Let's say I want to slice a long string s into trigrams and write them to disk (or something).
for i in xrange(0, len(s) - 3): x = s[i:i + 3] myfile.write(x)
At each step the slice copies the bytes of the string, even though all I do is write them to disk.
To be honest, copying three bytes is a trivial operation compared to all the rest (interpreting bytecode, instantiating and destructing bytes objects, calling I/O routines, etc.).
Shouldn't I be able to do this?
m = memoryview(s) for i in xrange(0, len(s) - 3): m.start = i m.end = i + 3 myfile.write(m)
You wouldn't win anything over the simple bytes slicing approach, IMO. (even with PyPy or another interpreter) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net
On Fri, Jul 27, 2012 at 8:40 AM, Matt Chaput <matt@whoosh.ca> wrote:
I could avoid copying with memoryviews...
for i in xrange(0, len(s) - 3): x = memoryview(s)[i:i + 3] myfile.write(x)
...but this is actually much slower (3x slower in some quick tests). I'm guessing it's because of all the object creation (while string slicing probably uses fast paths).
memorview objects are pretty big - they have to store a lot of pointers and other objects that describe their view of the underlying buffer. Bytes objects, on the other hand, have minimal overhead. (These numbers are for 3.3, which uses a revamped memoryview implementation)
import sys x = b"foo" sys.getsizeof(x) 36 sys.getsizeof(memoryview(x)) 184
There's also the execution speed overhead that comes from the indirection when accessing the contents. Thus, using views instead of copying really only starts to pay off once you're talking about comparatively large chunks of data:
x *= 1000 sys.getsizeof(x) 3033 sys.getsizeof(memoryview(x)) 184
Shouldn't I be able to do this?
m = memoryview(s) for i in xrange(0, len(s) - 3): m.start = i m.end = i + 3 myfile.write(m)
No, because making memorviews mutable would be a huge increase in complexity (and they're complex enough already - only with Stefan Krah's work in 3.3 have we finally worked most of the kinks out of the implementation). What you *can* do with a memoryview, though, is slice it, and the resulting object will be a memoryview that references a subset of the original object. This can be done with full slicing flexibility in 3.3, or in a more limited fashion in earlier versions. For example, processing a data sequence in chunks in 3.3 without copying and without inadvertently keeping a potentially large data object alive (and/or locked into immutability) by hanging on to a buffer reference: chunk_len = 512 # For small chunks, copying is likely faster. Measure it! with memorview(data) as m: for offset in range(0, len(m), chunk_len): with m[offset:offset+chunk_len] as x: process_chunk(x) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 27.07.2012 03:37, schrieb Nick Coghlan:
Thus, using views instead of copying really only starts to pay off once you're talking about comparatively large chunks of data:
x *= 1000 sys.getsizeof(x) 3033 sys.getsizeof(memoryview(x)) 184
It can pay off A LOT. I've recently added PEP 3118 buffer support to my smc.freeimage library (Cython wrapper of FreeImage and LCMS). smc.freeimage needs 0.006 sec for the loop as it returns a writable buffer of the internal pixel data (zero copy). PIL runs 1.165 sec as it uses the old buffer interface and copies its data every time. Benchmark code: https://bitbucket.org/tiran/smc.freeimage/src/b97e13e9f04c/contrib/benchmark... RW_COUNT = 300 tiff = Image(TIFF) start = time() for i in xrange(RW_COUNT): arr = asarray(tiff) # change last BGR -> RGB arr = arr[..., ::-1] bytescale(arr, 64, 192) end = time() - start tiff = pil_open(TIFF) tiff.load() start = time() for i in xrange(RW_COUNT): arr = asarray(tiff) bytescale(arr, 64, 192) end = time() - start
No, because making memorviews mutable would be a huge increase in complexity (and they're complex enough already - only with Stefan Krah's work in 3.3 have we finally worked most of the kinks out of the implementation).
+1 for Nick's -1 The new buffer interface and memoryview's are already very complex. I haven't found a library yet that supports all edge cases. Even NumPy doesn't support multidimensional and non-contiguous buffer with suboffsets. I wanted to use a sub-offset of 3 on the last dimension with a negative stride on the last dimension to convert BGR to RGB inside the buffer code. It didn't work because NumPy ignore the information. Christian
Den 27.07.2012 14:26, skrev Christian Heimes:
The new buffer interface and memoryview's are already very complex. I haven't found a library yet that supports all edge cases. Even NumPy doesn't support multidimensional and non-contiguous buffer with suboffsets. I wanted to use a sub-offset of 3 on the last dimension with a negative stride on the last dimension to convert BGR to RGB inside the buffer code. It didn't work because NumPy ignore the information.
For those who write C extensions with Cython, it might be useful to know that it can compile "Python-like" code that uses PEP 3118 buffers to very efficient C code: http://docs.cython.org/src/userguide/memoryviews.html On a simple test posted to the cython-user list, https://github.com/jakevdp/memview_benchmarks I only got 2.2 % increased run-time compared to plain C pointer arithmetics. That involved creating two million PEP 3118 buffer objects and computing one million dot products with length 1000. This takes most of the complexity of using PEP 3118 buffers in C extensions away, and allows us to write C extensions with a syntax that are very similar to Python with NumPy arrays. (On the other hand, the old "NumPy array syntax" in Cython is considerably slower, particularly when slicing the arrays.) Sturla
participants (5)
-
Antoine Pitrou
-
Christian Heimes
-
Matt Chaput
-
Nick Coghlan
-
Sturla Molden