Efficient copy of .NET Array to ctypes or numpy array.

I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck. Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array. (I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.) Thanks, Dave Cook

An aside that may be useful: .net will skip array bounds checking within simple for-loops, as an optimization. But only if the binaries have all their optimizations turned on. A binary built for debug has them turned off. There is a huge speed up for iterating over an array when these optimizations are used. So make sure you are not looking at a compiler optimization configuration problem.
On May 21, 2014, at 3:21 AM, Dave Cook <daverz@gmail.com> wrote:
I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck.
Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array.
(I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.)
Thanks, Dave Cook _________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet

You could write a .NET function to do this with fixed pointers and "memcpy" from the .NET array to the numpy data (the raw data). This would be the absolute fastest way, but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way). If you want I could probably write something up real quick. Jeff On Wednesday, May 21, 2014, Brad Friedman <brad@fie.us> wrote:
An aside that may be useful:
.net will skip array bounds checking within simple for-loops, as an optimization. But only if the binaries have all their optimizations turned on. A binary built for debug has them turned off. There is a huge speed up for iterating over an array when these optimizations are used. So make sure you are not looking at a compiler optimization configuration problem.
On May 21, 2014, at 3:21 AM, Dave Cook <daverz@gmail.com <javascript:;>> wrote:
I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck.
Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array.
(I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.)
Thanks, Dave Cook _________________________________________________ Python.NET mailing list - PythonDotNet@python.org <javascript:;> https://mail.python.org/mailman/listinfo/pythondotnet
Python.NET mailing list - PythonDotNet@python.org <javascript:;> https://mail.python.org/mailman/listinfo/pythondotnet

I was tempted to code it up, and it turns out you can do it in pure python. I thought of 4 ways to copy the data: using a for loop (like you did), using numpy.fromiter, using numpy.fromstring, and using Marshal.Copy. Obviously the for loop is the slowest. numpy.fromiter is still slow, but ~2.5x faster than the for loop (still has all .NET array checks since this is in Python and the indexers cannot be optimized away). The last two do direct memory copies (fromstring gets the memory pointer of the .NET array while Marshal.Copy gets the memory pointer of the NumPy array). They are both MUCH faster than a for loop, especially for larger arrays (>200x faster). numpy.fromstring is faster for smaller arrays, but by the time I got to 10000000 doubles it was twice as slow as Marshal.Copy. Here is the code: import clr from System import Array, Double, IntPtr, Random import numpy as np import time def check_arrays(a, b): if len(a) != len(b): print("Arrays are different size!") if any(A != B for A,B in zip(a, b)): print("Arrays have different values!") print("Creating source...") r = Random() src = Array.CreateInstance(Double, 10000000) for i in xrange(len(src)): src[i] = r.NextDouble() print('Copy using for loop'), start = time.clock() dest = np.empty(len(src)) for i in xrange(len(src)): dest[i] = src[i] end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using fromiter'), start = time.clock() dest = np.fromiter(src, float) end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using fromstring'), from ctypes import string_at from System.Runtime.InteropServices import GCHandle, GCHandleType start = time.clock() src_hndl = GCHandle.Alloc(src, GCHandleType.Pinned) try: src_ptr = src_hndl.AddrOfPinnedObject().ToInt32() dest = np.fromstring(string_at(src_ptr, len(src)*8)) # note: 8 is size of double... finally: if src_hndl.IsAllocated: src_hndl.Free() end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) print('Copy using Marshal.Copy'), from System.Runtime.InteropServices import Marshal start = time.clock() dest = np.empty(len(src)) Marshal.Copy(src, 0, IntPtr.__overloads__[int](dest.__array_interface__['data'][0]), len(src)) end = time.clock() print('in %f sec' % (end-start)) check_arrays(src, dest) Jeff On Wed, May 21, 2014 at 10:58 AM, Jeffrey Bush <jeff@coderforlife.com>wrote:
You could write a .NET function to do this with fixed pointers and "memcpy" from the .NET array to the numpy data (the raw data). This would be the absolute fastest way, but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way). If you want I could probably write something up real quick.
Jeff
On Wednesday, May 21, 2014, Brad Friedman <brad@fie.us> wrote:
An aside that may be useful:
.net will skip array bounds checking within simple for-loops, as an optimization. But only if the binaries have all their optimizations turned on. A binary built for debug has them turned off. There is a huge speed up for iterating over an array when these optimizations are used. So make sure you are not looking at a compiler optimization configuration problem.
On May 21, 2014, at 3:21 AM, Dave Cook <daverz@gmail.com> wrote:
I need to copy a .NET Array (e.g. Double[] or Byte[]) to a numpy array, but it seems the only way to do so is element by element, which is very slow. Since we are copying a lot of data in real time, it creates a real bottleneck.
Alternatively, efficient conversion of the .NET array to a Python style byte string would allow numpy.fromstring() to be used for creating the numpy array.
(I see a similar question went unanswered on the list in August 2011, but I was hoping someone may have figured it out by now.)
Thanks, Dave Cook _________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
participants (3)
-
Brad Friedman
-
Dave Cook
-
Jeffrey Bush