[Python.NET] Efficient copy of .NET Array to ctypes or numpy array

Jeffrey Bush jeff at coderforlife.com
Wed Oct 29 04:04:32 CET 2014


I finally have a chance to chime in, and Bradley is exactly right.
Marshall.Copy copies the raw data, and apparently your file library does
not store that data in a nice, contiguous, manner. While it is highly
likely that copying all the data to an array in C# will be faster than the
fromiter in Python, I am unsure if copying all the data to an array in C#
then copying all the data again to a numpy array will be faster than
fromiter (cause you have to copy it twice). The exception is if the file
library has a function like ToArray that is optimized to copy the data to a
linear chunk of data. So, what type is "Data"?

Another factor is how long the chunk of data you are copying is. You say
the last axis is only 400 elements long. Check out my code and you will see
that at 400 elements long, fromiter is actually the fastest (at least when
I tried). An example run:

Copy using for loop in 0.000884 sec
Copy using fromiter in 0.000144 sec # fastest
Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than
fromiter
Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than
fromiter

I start to do better with Marshal.Copy then fromiter around 5000 elements
copied. This is because the overhead of the mass copies is high but adding
each element doesn't take much time. fromstring has a lower overhead but
slightly longer per-element time (fromstring is better than Marshal.Copy
until ~200,000 elements).

So you might be doing as good as you can possibly do. If I knew more about
your file format library I might be able to provide more insight.

Jeff

On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad at fie.us> wrote:

> Well it makes sense to me that doing it via an iterator, and element at a
> time, would be slow.  There’s a lot of call overhead associated with each
> iteration step.  Whether it’s done in .net, or in python, or a call from
> one to the other, it will be slow.  It’s still a call where you’d be better
> off copying whole buffers.
>
> Ideally you’d pull the data into as simple and raw a data structure as you
> can on the dotnet side, in a buffered manner.  Then you’d execute a
> movement of the data across, a reasonably sized chunk of buffer at a time.
> This will reduce call overhead and also allow read-ahead caching to do its
> thing on the file-access side of things.
>
> Your suggestion of loading into a .net array and then moving that array
> over, makes sense.  But I think it comes down to what you can do with the
> third party file-format library. If its not going to provide you with the
> data as some kind of buffer with a cohesive and known format in memory,
> you’re not really going to be able to move it over without iterating over
> it and reformatting it at some point.
>
> Specifically, I’d point to Jeffery’s original caveat:
>
> "but does involve a number of assumptions (for example that the data in
> the two arrays are laid out in the same way)."
>
> The question is:  is there a way to get the data off of disk and in memory
> from dotnet library, where its layout in memory is known, and something you
> want exactly as it is, but in python?  If so, you should be able to use the
> methods from the afore linked thread.  If not, you’re probably stuck
> iterating somewhere to reformat it, no matter what.  Which is probably why
> you got garbage back.  I’m guessing the object returned from the dotnet
> file-format-library isn’t laid out right, as suggested in the afore
> referenced caveat.
>
>
> > On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju at gmail.com> wrote:
> >
> > Hello,
> > Yeah, I read data from a file say at each node and each time step, but
> when i try to use Marshal approach i get gibberish but when i use simple
> iter i get correct values. i have been trying the approach used in example
> in the previous post and that example makes sense but it doesnt make sense
> when i use it in my case. I am right now assigning it to a variable, i am
> now thinking of exploring the possibility of saving data to a dot net array
> maybe using System.Array and saving data to it but not sure if that even
> make sense.
> >
> > Sent from my iPhone
>
> _________________________________________________
> Python.NET mailing list - PythonDotNet at python.org
> https://mail.python.org/mailman/listinfo/pythondotnet
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pythondotnet/attachments/20141028/1b36c15b/attachment-0001.html>


More information about the PythonDotNet mailing list