[Python.NET] Efficient copy of .NET Array to ctypes or numpy array

Nikhil Garg nikhilgarg.gju at gmail.com
Thu Oct 30 18:19:15 CET 2014


Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me
well and has reduced my processing time considerably, so I am just going to
stick with it.


On 29 October 2014 11:04, Jeffrey Bush <jeff at coderforlife.com> wrote:

> I finally have a chance to chime in, and Bradley is exactly right.
> Marshall.Copy copies the raw data, and apparently your file library does
> not store that data in a nice, contiguous, manner. While it is highly
> likely that copying all the data to an array in C# will be faster than the
> fromiter in Python, I am unsure if copying all the data to an array in C#
> then copying all the data again to a numpy array will be faster than
> fromiter (cause you have to copy it twice). The exception is if the file
> library has a function like ToArray that is optimized to copy the data to a
> linear chunk of data. So, what type is "Data"?
>
> Another factor is how long the chunk of data you are copying is. You say
> the last axis is only 400 elements long. Check out my code and you will see
> that at 400 elements long, fromiter is actually the fastest (at least when
> I tried). An example run:
>
> Copy using for loop in 0.000884 sec
> Copy using fromiter in 0.000144 sec # fastest
> Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than
> fromiter
> Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than
> fromiter
>
> I start to do better with Marshal.Copy then fromiter around 5000 elements
> copied. This is because the overhead of the mass copies is high but adding
> each element doesn't take much time. fromstring has a lower overhead but
> slightly longer per-element time (fromstring is better than Marshal.Copy
> until ~200,000 elements).
>
> So you might be doing as good as you can possibly do. If I knew more about
> your file format library I might be able to provide more insight.
>
> Jeff
>
> On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad at fie.us> wrote:
>
>> Well it makes sense to me that doing it via an iterator, and element at a
>> time, would be slow.  There’s a lot of call overhead associated with each
>> iteration step.  Whether it’s done in .net, or in python, or a call from
>> one to the other, it will be slow.  It’s still a call where you’d be better
>> off copying whole buffers.
>>
>> Ideally you’d pull the data into as simple and raw a data structure as
>> you can on the dotnet side, in a buffered manner.  Then you’d execute a
>> movement of the data across, a reasonably sized chunk of buffer at a time.
>> This will reduce call overhead and also allow read-ahead caching to do its
>> thing on the file-access side of things.
>>
>> Your suggestion of loading into a .net array and then moving that array
>> over, makes sense.  But I think it comes down to what you can do with the
>> third party file-format library. If its not going to provide you with the
>> data as some kind of buffer with a cohesive and known format in memory,
>> you’re not really going to be able to move it over without iterating over
>> it and reformatting it at some point.
>>
>> Specifically, I’d point to Jeffery’s original caveat:
>>
>> "but does involve a number of assumptions (for example that the data in
>> the two arrays are laid out in the same way)."
>>
>> The question is:  is there a way to get the data off of disk and in
>> memory from dotnet library, where its layout in memory is known, and
>> something you want exactly as it is, but in python?  If so, you should be
>> able to use the methods from the afore linked thread.  If not, you’re
>> probably stuck iterating somewhere to reformat it, no matter what.  Which
>> is probably why you got garbage back.  I’m guessing the object returned
>> from the dotnet file-format-library isn’t laid out right, as suggested in
>> the afore referenced caveat.
>>
>>
>> > On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju at gmail.com> wrote:
>> >
>> > Hello,
>> > Yeah, I read data from a file say at each node and each time step, but
>> when i try to use Marshal approach i get gibberish but when i use simple
>> iter i get correct values. i have been trying the approach used in example
>> in the previous post and that example makes sense but it doesnt make sense
>> when i use it in my case. I am right now assigning it to a variable, i am
>> now thinking of exploring the possibility of saving data to a dot net array
>> maybe using System.Array and saving data to it but not sure if that even
>> make sense.
>> >
>> > Sent from my iPhone
>>
>> _________________________________________________
>> Python.NET mailing list - PythonDotNet at python.org
>> https://mail.python.org/mailman/listinfo/pythondotnet
>>
>
>
> _________________________________________________
> Python.NET mailing list - PythonDotNet at python.org
> https://mail.python.org/mailman/listinfo/pythondotnet
>



-- 
Regards

Nikhil

-------------------------------------------------------------------
Big whirls have little whirls,
Which feed on their velocity,
And little whirls have lesser whirls,
And so on to viscosity
(Richardson, 1922)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pythondotnet/attachments/20141031/9e1e5b80/attachment.html>


More information about the PythonDotNet mailing list