From daverz at gmail.com Wed May 21 09:21:01 2014 From: daverz at gmail.com (Dave Cook) Date: Wed, 21 May 2014 00:21:01 -0700 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. Message-ID: I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck. Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array. (I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.) Thanks, Dave Cook -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad at fie.us Wed May 21 16:32:23 2014 From: brad at fie.us (Brad Friedman) Date: Wed, 21 May 2014 10:32:23 -0400 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. In-Reply-To: References: Message-ID: An aside that may be useful: .net will skip array bounds checking within simple for-loops, as an optimization. But only if the binaries have all their optimizations turned on. A binary built for debug has them turned off. There is a huge speed up for iterating over an array when these optimizations are used. So make sure you are not looking at a compiler optimization configuration problem. > On May 21, 2014, at 3:21 AM, Dave Cook wrote: > > I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck. > > Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array. > > (I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.) > > Thanks, > Dave Cook > _________________________________________________ > Python.NET mailing list - PythonDotNet at python.org > https://mail.python.org/mailman/listinfo/pythondotnet From jeff at coderforlife.com Wed May 21 19:58:42 2014 From: jeff at coderforlife.com (Jeffrey Bush) Date: Wed, 21 May 2014 10:58:42 -0700 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. In-Reply-To: References: Message-ID: You could write a .NET function to do this with fixed pointers and "memcpy" from the .NET array to the numpy data (the raw data). This would be the absolute fastest way, but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way). If you want I could probably write something up real quick. Jeff On Wednesday, May 21, 2014, Brad Friedman wrote: > An aside that may be useful: > > .net will skip array bounds checking within simple for-loops, as an > optimization. But only if the binaries have all their optimizations turned > on. A binary built for debug has them turned off. There is a huge speed up > for iterating over an array when these optimizations are used. So make sure > you are not looking at a compiler optimization configuration problem. > > > On May 21, 2014, at 3:21 AM, Dave Cook > > wrote: > > > > I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, > but it seems the only way to do so is element by element, which is very > slow. Since we are copying a lot of data in real time, it creates a real > bottleneck. > > > > Alternatively, efficient conversion of the .NET array to a Python style > byte string would allow numpy.fromstring() to be used for creating the > numpy array. > > > > (I see a similar question went unanswered on the list in August 2011, > but I was hoping someone may have figured it out by now.) > > > > Thanks, > > Dave Cook > > _________________________________________________ > > Python.NET mailing list - PythonDotNet at python.org > > https://mail.python.org/mailman/listinfo/pythondotnet > _________________________________________________ > Python.NET mailing list - PythonDotNet at python.org > https://mail.python.org/mailman/listinfo/pythondotnet > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at coderforlife.com Wed May 21 21:24:21 2014 From: jeff at coderforlife.com (Jeffrey Bush) Date: Wed, 21 May 2014 12:24:21 -0700 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. In-Reply-To: References: Message-ID: I was tempted to code it up, and it turns out you can do it in pure python. I thought of 4 ways to copy the data: using a for loop (like you did), using numpy.fromiter, using numpy.fromstring, and using Marshal.Copy. Obviously the for loop is the slowest. numpy.fromiter is still slow, but ~2.5x faster than the for loop (still has all .NET array checks since this is in Python and the indexers cannot be optimized away). The last two do direct memory copies (fromstring gets the memory pointer of the .NET array while Marshal.Copy gets the memory pointer of the NumPy array). They are both MUCH faster than a for loop, especially for larger arrays (>200x faster). numpy.fromstring is faster for smaller arrays, but by the time I got to 10000000 doubles it was twice as slow as Marshal.Copy. Here is the code: import clr from System import Array, Double, IntPtr, Random import numpy as np import time def check_arrays(a, b): if len(a) != len(b): print("Arrays are different size!") if any(A != B for A,B in zip(a, b)): print("Arrays have different values!") print("Creating source...") r = Random() src = Array.CreateInstance(Double, 10000000) for i in xrange(len(src)): src[i] = r.NextDouble() print('Copy using for loop'), start = time.clock() dest = np.empty(len(src)) for i in xrange(len(src)): dest[i] = src[i] end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using fromiter'), start = time.clock() dest = np.fromiter(src, float) end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using fromstring'), from ctypes import string_at from System.Runtime.InteropServices import GCHandle, GCHandleType start = time.clock() src_hndl = GCHandle.Alloc(src, GCHandleType.Pinned) try: src_ptr = src_hndl.AddrOfPinnedObject().ToInt32() dest = np.fromstring(string_at(src_ptr, len(src)*8)) # note: 8 is size of double... finally: if src_hndl.IsAllocated: src_hndl.Free() end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using Marshal.Copy'), from System.Runtime.InteropServices import Marshal start = time.clock() dest = np.empty(len(src)) Marshal.Copy(src, 0, IntPtr.__overloads__[int](dest.__array_interface__['data'][0]), len(src)) end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) Jeff On Wed, May 21, 2014 at 10:58 AM, Jeffrey Bush wrote: > You could write a .NET function to do this with fixed pointers and > "memcpy" from the .NET array to the numpy data (the raw data). This would > be the absolute fastest way, but does involve a number of assumptions (for > example that the data in the two arrays are laid out in the same way). If > you want I could probably write something up real quick. > > Jeff > > > On Wednesday, May 21, 2014, Brad Friedman wrote: > >> An aside that may be useful: >> >> .net will skip array bounds checking within simple for-loops, as an >> optimization. But only if the binaries have all their optimizations turned >> on. A binary built for debug has them turned off. There is a huge speed up >> for iterating over an array when these optimizations are used. So make sure >> you are not looking at a compiler optimization configuration problem. >> >> > On May 21, 2014, at 3:21 AM, Dave Cook wrote: >> > >> > I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, >> but it seems the only way to do so is element by element, which is very >> slow. Since we are copying a lot of data in real time, it creates a real >> bottleneck. >> > >> > Alternatively, efficient conversion of the .NET array to a Python style >> byte string would allow numpy.fromstring() to be used for creating the >> numpy array. >> > >> > (I see a similar question went unanswered on the list in August 2011, >> but I was hoping someone may have figured it out by now.) >> > >> > Thanks, >> > Dave Cook >> > _________________________________________________ >> > Python.NET mailing list - PythonDotNet at python.org >> > https://mail.python.org/mailman/listinfo/pythondotnet >> _________________________________________________ >> Python.NET mailing list - PythonDotNet at python.org >> https://mail.python.org/mailman/listinfo/pythondotnet >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daverz at gmail.com Fri May 23 12:34:00 2014 From: daverz at gmail.com (Dave Cook) Date: Fri, 23 May 2014 03:34:00 -0700 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. Message-ID: (Sorry for screwing up the thread; I messed up my list subscription. This is a response to https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html ) Thanks, Jeffrey, that's awesome. Since the pointer can be directly accessed, np.frombuffer() can be used to avoid a copy. src_hndl = GCHandle.Alloc(src, GCHandleType.Pinned) try: src_ptr = src_hndl.AddrOfPinnedObject().ToInt32() bufType = ctypes.c_double*len(src) cbuf = bufType.from_address(src_ptr) dest = np.frombuffer(cbuf, dtype=cbuf._type_) finally: if src_hndl.IsAllocated: src_hndl.Free() Dave Cook From jeff at coderforlife.com Fri May 23 18:27:25 2014 From: jeff at coderforlife.com (Jeffrey Bush) Date: Fri, 23 May 2014 09:27:25 -0700 Subject: [Python.NET] Efficient copy of .NET Array to ctypes or numpy array. In-Reply-To: References: Message-ID: The problem with your current code is that as soon as you call src_hndl.Free() the pointer is not necessarily valid any more! .NET is allowed to move the memory contents of objects at will unless they are pinned (which is what the first line does). By freeing the GCHandle it is no longer pinned and liable to move/freed as the garbage collector compacts memory. Also, in 64-bit Python, you will need to call ToInt64() instead of ToInt32(). In fact, it may always be reasonable to call ToInt64() since Python's integer type will likely deal with it being 32 or 64-bit automatically. So all in all: - Use ToInt64() instead of ToInt32(), at least in 64-bit Python (e.g. x.ToInt64() if ctypes.sizeof(ctypes.c_void_p) == 8 else x.ToInt32()) - Do not call GCHandle.Free() until you are completely done with the memory pointer, but make sure you definitely call it after you are done with the memory pointer because otherwise the garbage collector can never free or move the memory that is pinned (resulting in memory leaks or fragmentation) Jeff On Fri, May 23, 2014 at 3:34 AM, Dave Cook wrote: > (Sorry for screwing up the thread; I messed up my list subscription. > This is a response to > https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html ) > > Thanks, Jeffrey, that's awesome. Since the pointer can be directly > accessed, np.frombuffer() can be used to avoid a copy. > > src_hndl = GCHandle.Alloc(src, GCHandleType.Pinned) > try: > src_ptr = src_hndl.AddrOfPinnedObject().ToInt32() > bufType = ctypes.c_double*len(src) > cbuf = bufType.from_address(src_ptr) > dest = np.frombuffer(cbuf, dtype=cbuf._type_) > finally: > if src_hndl.IsAllocated: src_hndl.Free() > > Dave Cook > _________________________________________________ > Python.NET mailing list - PythonDotNet at python.org > https://mail.python.org/mailman/listinfo/pythondotnet > -------------- next part -------------- An HTML attachment was scrubbed... URL: