Efficient copy of .NET Array to ctypes or numpy array
Hi, I have looked at the following thread to copy a c# array to numpy list using python dot net. https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html When I try the methods mentioned in the complete thread I manage to get the array from simple iteration method or np.fromiter, but I have not been able to make GCHandle or Marshal.Copy work for my case. To explain about my issue, I use a computer code written in c# which produces output in a custom build format. I am able to access the data from the file system using python dot net, but it seems that it take almost half an hour to get the data from the file. For example my file has an array of size 5018x73x400, if i use *fromiter, *i provide the last bit of array to copy to python due to the way file stores output. for j in xrange(num_nodes-1): for i in xrange(nsteps): temp = file_in.ReadItemTimeStep(j+1,i).Data temp_1 = np.fromiter(temp, float) dest[j, i] = temp_1 Other way of doing same task would be to use a for loop but tht would be really slow. for k in xrange(num_nodes-1): for k in xrange(nsteps): for i in xrange(nvals): temp[k,j,i] = file_in.ReadItemTimeStep(k+1, j).Data[i] I am trying to find a faster way to extract the data from the file. I was wondering if there is a possibility to use something like GChandle or Marshal.Copy. I have also attached same output from Marshal.Copy Marshal.Copy(temp, 0, IntPtr.__overloads__[int](temp2.__array_interface__['data'][0]), len(temp)) 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 3.13551399e-242, 4.20993645e-161, 8.71245319e-119, 8.38370432e-089, 2.60762781e-075, 1.92616374e-072, 2.89006184e-072, 3.06007265e-082, 3.81879776e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.03959661e-297, 7.22038448e-196, 3.05237954e-130, 2.58469346e-093, 2.29495911e-070, 3.74354820e-062, 1.74288060e-061, 2.13172553e-067, 2.74305386e-077, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.02484756e-240, 1.70193770e-154, 1.30625320e-108, 1.16793762e-075, 2.96889340e-061, 7.84687103e-058, 1.16879604e-057, 5.56256478e-068, 4.06437857e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.11705239e-296, 1.46978156e-194, 1.98034372e-128, 3.79615170e-091, 5.88936509e-068, 1.85305929e-059, 2.16186001e-058, Whereas the actual values stored in the array are 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.66641200e-37, 1.24961600e-30, 5.34846800e-25, 1.72519600e-20, 1.49402200e-17, 3.27685300e-15, 4.10687500e-13, 1.87978200e-11, 2.01397900e-10, 8.83561100e-10, 2.14397600e-09, 2.02600300e-09, 2.47599800e-09, 2.15987100e-09, 4.82694100e-10, 1.18759300e-10, 3.32024100e-11, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.45288600e-37, 7.37245900e-31, 7.59282500e-25, 5.85463400e-20, 1.20634700e-16, 3.06414300e-14, 5.16041300e-12, 2.87562800e-10, 3.62841500e-09, 1.63540600e-08, 4.06988300e-08, 3.97764200e-08, 4.88768500e-08, 4.07385000e-08, 8.78868600e-09, I would appreciate any help i can get regarding this matter. Nikhil
You indicate that you are reading from a file. The thread you reference was about copying data in memory. I’d think matters of buffering and read-ahead caches would be far more relevant than anything else. Am I missing something?
On Oct 26, 2014, at 5:18 AM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Hi,
I have looked at the following thread to copy a c# array to numpy list using python dot net.
https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html <https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html>
When I try the methods mentioned in the complete thread I manage to get the array from simple iteration method or np.fromiter, but I have not been able to make GCHandle or Marshal.Copy work for my case.
To explain about my issue, I use a computer code written in c# which produces output in a custom build format. I am able to access the data from the file system using python dot net, but it seems that it take almost half an hour to get the data from the file.
For example my file has an array of size 5018x73x400, if i use fromiter, i provide the last bit of array to copy to python due to the way file stores output.
for j in xrange(num_nodes-1): for i in xrange(nsteps): temp = file_in.ReadItemTimeStep(j+1,i).Data temp_1 = np.fromiter(temp, float) dest[j, i] = temp_1
Other way of doing same task would be to use a for loop but tht would be really slow.
for k in xrange(num_nodes-1): for k in xrange(nsteps): for i in xrange(nvals): temp[k,j,i] = file_in.ReadItemTimeStep(k+1, j).Data[i]
I am trying to find a faster way to extract the data from the file. I was wondering if there is a possibility to use something like GChandle or Marshal.Copy. I have also attached same output from Marshal.Copy
Marshal.Copy(temp, 0, IntPtr.__overloads__[int](temp2.__array_interface__['data'][0]), len(temp))
0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 3.13551399e-242, 4.20993645e-161, 8.71245319e-119, 8.38370432e-089, 2.60762781e-075, 1.92616374e-072, 2.89006184e-072, 3.06007265e-082, 3.81879776e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.03959661e-297, 7.22038448e-196, 3.05237954e-130, 2.58469346e-093, 2.29495911e-070, 3.74354820e-062, 1.74288060e-061, 2.13172553e-067, 2.74305386e-077, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.02484756e-240, 1.70193770e-154, 1.30625320e-108, 1.16793762e-075, 2.96889340e-061, 7.84687103e-058, 1.16879604e-057, 5.56256478e-068, 4.06437857e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.11705239e-296, 1.46978156e-194, 1.98034372e-128, 3.79615170e-091, 5.88936509e-068, 1.85305929e-059, 2.16186001e-058,
Whereas the actual values stored in the array are 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.66641200e-37, 1.24961600e-30, 5.34846800e-25, 1.72519600e-20, 1.49402200e-17, 3.27685300e-15, 4.10687500e-13, 1.87978200e-11, 2.01397900e-10, 8.83561100e-10, 2.14397600e-09, 2.02600300e-09, 2.47599800e-09, 2.15987100e-09, 4.82694100e-10, 1.18759300e-10, 3.32024100e-11, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.45288600e-37, 7.37245900e-31, 7.59282500e-25, 5.85463400e-20, 1.20634700e-16, 3.06414300e-14, 5.16041300e-12, 2.87562800e-10, 3.62841500e-09, 1.63540600e-08, 4.06988300e-08, 3.97764200e-08, 4.88768500e-08, 4.07385000e-08, 8.78868600e-09,
I would appreciate any help i can get regarding this matter.
Nikhil _________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense. Sent from my iPhone
On 28 Oct 2014, at 12:50 am, Bradley Friedman <brad@fie.us> wrote:
You indicate that you are reading from a file. The thread you reference was about copying data in memory. I’d think matters of buffering and read-ahead caches would be far more relevant than anything else. Am I missing something?
On Oct 26, 2014, at 5:18 AM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Hi,
I have looked at the following thread to copy a c# array to numpy list using python dot net.
https://mail.python.org/pipermail/pythondotnet/2014-May/001525.html
When I try the methods mentioned in the complete thread I manage to get the array from simple iteration method or np.fromiter, but I have not been able to make GCHandle or Marshal.Copy work for my case.
To explain about my issue, I use a computer code written in c# which produces output in a custom build format. I am able to access the data from the file system using python dot net, but it seems that it take almost half an hour to get the data from the file.
For example my file has an array of size 5018x73x400, if i use fromiter, i provide the last bit of array to copy to python due to the way file stores output.
for j in xrange(num_nodes-1): for i in xrange(nsteps): temp = file_in.ReadItemTimeStep(j+1,i).Data temp_1 = np.fromiter(temp, float) dest[j, i] = temp_1
Other way of doing same task would be to use a for loop but tht would be really slow.
for k in xrange(num_nodes-1): for k in xrange(nsteps): for i in xrange(nvals): temp[k,j,i] = file_in.ReadItemTimeStep(k+1, j).Data[i]
I am trying to find a faster way to extract the data from the file. I was wondering if there is a possibility to use something like GChandle or Marshal.Copy. I have also attached same output from Marshal.Copy
Marshal.Copy(temp, 0, IntPtr.__overloads__[int](temp2.__array_interface__['data'][0]), len(temp))
0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 3.13551399e-242, 4.20993645e-161, 8.71245319e-119, 8.38370432e-089, 2.60762781e-075, 1.92616374e-072, 2.89006184e-072, 3.06007265e-082, 3.81879776e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.03959661e-297, 7.22038448e-196, 3.05237954e-130, 2.58469346e-093, 2.29495911e-070, 3.74354820e-062, 1.74288060e-061, 2.13172553e-067, 2.74305386e-077, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.02484756e-240, 1.70193770e-154, 1.30625320e-108, 1.16793762e-075, 2.96889340e-061, 7.84687103e-058, 1.16879604e-057, 5.56256478e-068, 4.06437857e-315, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 1.11705239e-296, 1.46978156e-194, 1.98034372e-128, 3.79615170e-091, 5.88936509e-068, 1.85305929e-059, 2.16186001e-058,
Whereas the actual values stored in the array are 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 5.66641200e-37, 1.24961600e-30, 5.34846800e-25, 1.72519600e-20, 1.49402200e-17, 3.27685300e-15, 4.10687500e-13, 1.87978200e-11, 2.01397900e-10, 8.83561100e-10, 2.14397600e-09, 2.02600300e-09, 2.47599800e-09, 2.15987100e-09, 4.82694100e-10, 1.18759300e-10, 3.32024100e-11, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.45288600e-37, 7.37245900e-31, 7.59282500e-25, 5.85463400e-20, 1.20634700e-16, 3.06414300e-14, 5.16041300e-12, 2.87562800e-10, 3.62841500e-09, 1.63540600e-08, 4.06988300e-08, 3.97764200e-08, 4.88768500e-08, 4.07385000e-08, 8.78868600e-09,
I would appreciate any help i can get regarding this matter.
Nikhil _________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers. Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things. Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point. Specifically, I’d point to Jeffery’s original caveat: "but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)." The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"? Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run: Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements). So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight. Jeff On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it. On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards Nikhil ------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example? Thanks, Denis On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
And more important question - is it possible to generalize the copying of python array object to managed C# array object without knowing the data type/size/length? On Wed, Nov 5, 2014 at 8:58 AM, Denis Akhiyarov <denis.akhiyarov@gmail.com> wrote:
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example?
Thanks, Denis
On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Finally decided to generate the managed object on Python side and return it to C# with no conversion necessary in C#. This way I can even wrap regular function with Python with @decorator to handle the conversion. I suppose the dynamic version of pythonnet may have auto conversion for Python 3, but I have not tried. I'm on Python 2.7. On Wed, Nov 5, 2014 at 9:31 AM, Denis Akhiyarov <denis.akhiyarov@gmail.com> wrote:
And more important question - is it possible to generalize the copying of python array object to managed C# array object without knowing the data type/size/length?
On Wed, Nov 5, 2014 at 8:58 AM, Denis Akhiyarov <denis.akhiyarov@gmail.com
wrote:
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example?
Thanks, Denis
On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote:
Hello, Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense.
Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
To copy from a list or tuple you need to use the Python buffer() function. You can use that on numpy arrays as well. In buffer form, the len() is the byte length so then you don't need to know the data type or size. However, some additional work would be required if you wanted to make sure the C# array was of the proper type and length. Jeff On Wed, Nov 5, 2014 at 11:28 AM, Denis Akhiyarov <denis.akhiyarov@gmail.com> wrote:
Finally decided to generate the managed object on Python side and return it to C# with no conversion necessary in C#. This way I can even wrap regular function with Python with @decorator to handle the conversion. I suppose the dynamic version of pythonnet may have auto conversion for Python 3, but I have not tried. I'm on Python 2.7.
On Wed, Nov 5, 2014 at 9:31 AM, Denis Akhiyarov <denis.akhiyarov@gmail.com
wrote:
And more important question - is it possible to generalize the copying of python array object to managed C# array object without knowing the data type/size/length?
On Wed, Nov 5, 2014 at 8:58 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example?
Thanks, Denis
On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg <nikhilgarg.gju@gmail.com> wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
Well it makes sense to me that doing it via an iterator, and element at a time, would be slow. There’s a lot of call overhead associated with each iteration step. Whether it’s done in .net, or in python, or a call from one to the other, it will be slow. It’s still a call where you’d be better off copying whole buffers.
Ideally you’d pull the data into as simple and raw a data structure as you can on the dotnet side, in a buffered manner. Then you’d execute a movement of the data across, a reasonably sized chunk of buffer at a time. This will reduce call overhead and also allow read-ahead caching to do its thing on the file-access side of things.
Your suggestion of loading into a .net array and then moving that array over, makes sense. But I think it comes down to what you can do with the third party file-format library. If its not going to provide you with the data as some kind of buffer with a cohesive and known format in memory, you’re not really going to be able to move it over without iterating over it and reformatting it at some point.
Specifically, I’d point to Jeffery’s original caveat:
"but does involve a number of assumptions (for example that the data in the two arrays are laid out in the same way)."
The question is: is there a way to get the data off of disk and in memory from dotnet library, where its layout in memory is known, and something you want exactly as it is, but in python? If so, you should be able to use the methods from the afore linked thread. If not, you’re probably stuck iterating somewhere to reformat it, no matter what. Which is probably why you got garbage back. I’m guessing the object returned from the dotnet file-format-library isn’t laid out right, as suggested in the afore referenced caveat.
> On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> wrote: > > Hello, > Yeah, I read data from a file say at each node and each time step, but when i try to use Marshal approach i get gibberish but when i use simple iter i get correct values. i have been trying the approach used in example in the previous post and that example makes sense but it doesnt make sense when i use it in my case. I am right now assigning it to a variable, i am now thinking of exploring the possibility of saving data to a dot net array maybe using System.Array and saving data to it but not sure if that even make sense. > > Sent from my iPhone
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
Jeff, Thank you for quick reply. Can you give any example of how would buffer() help converting numpy/python arrays to managed? For now I developed as decorator (on my personal time) that handles all I/O conversion on python side (assumes one input and one output arbitrary array): def decornet(func,T=System.Object): def inner(*args,**kwargs): res = np.array(func(listit(args[0]),*args[1:],**kwargs)) tnum = res.dtype.type if tnum is np.int32: tnet = System.Int32 elif tnum is np.int64: tnet = System.Int64 elif tnum is np.float: tnet = System.Single elif tnum is np.double: tnet = System.Double elif tnum is np.bool: tnet = System.Boolean else: tnet = T netarr = Array.CreateInstance(tnet,*res.shape) it = np.nditer(res,flags=['multi_index']) while not it.finished: ix = it.multi_index if len(ix)==1: netarr[ix[0]] = res[ix[0]] else: netarr[ix] = res[ix] it.iternext() return netarr return inner Thanks, Denis On Wed, Nov 5, 2014 at 4:32 PM, Jeffrey Bush <jeff@coderforlife.com> wrote:
To copy from a list or tuple you need to use the Python buffer() function. You can use that on numpy arrays as well. In buffer form, the len() is the byte length so then you don't need to know the data type or size. However, some additional work would be required if you wanted to make sure the C# array was of the proper type and length.
Jeff
On Wed, Nov 5, 2014 at 11:28 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
Finally decided to generate the managed object on Python side and return it to C# with no conversion necessary in C#. This way I can even wrap regular function with Python with @decorator to handle the conversion. I suppose the dynamic version of pythonnet may have auto conversion for Python 3, but I have not tried. I'm on Python 2.7.
On Wed, Nov 5, 2014 at 9:31 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
And more important question - is it possible to generalize the copying of python array object to managed C# array object without knowing the data type/size/length?
On Wed, Nov 5, 2014 at 8:58 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example?
Thanks, Denis
On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg <nikhilgarg.gju@gmail.com
wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec Copy using fromiter in 0.000144 sec # fastest Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff
On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> wrote:
> Well it makes sense to me that doing it via an iterator, and element > at a time, would be slow. There’s a lot of call overhead associated with > each iteration step. Whether it’s done in .net, or in python, or a call > from one to the other, it will be slow. It’s still a call where you’d be > better off copying whole buffers. > > Ideally you’d pull the data into as simple and raw a data structure > as you can on the dotnet side, in a buffered manner. Then you’d execute a > movement of the data across, a reasonably sized chunk of buffer at a time. > This will reduce call overhead and also allow read-ahead caching to do its > thing on the file-access side of things. > > Your suggestion of loading into a .net array and then moving that > array over, makes sense. But I think it comes down to what you can do with > the third party file-format library. If its not going to provide you with > the data as some kind of buffer with a cohesive and known format in memory, > you’re not really going to be able to move it over without iterating over > it and reformatting it at some point. > > Specifically, I’d point to Jeffery’s original caveat: > > "but does involve a number of assumptions (for example that the data > in the two arrays are laid out in the same way)." > > The question is: is there a way to get the data off of disk and in > memory from dotnet library, where its layout in memory is known, and > something you want exactly as it is, but in python? If so, you should be > able to use the methods from the afore linked thread. If not, you’re > probably stuck iterating somewhere to reformat it, no matter what. Which > is probably why you got garbage back. I’m guessing the object returned > from the dotnet file-format-library isn’t laid out right, as suggested in > the afore referenced caveat. > > > > On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> > wrote: > > > > Hello, > > Yeah, I read data from a file say at each node and each time step, > but when i try to use Marshal approach i get gibberish but when i use > simple iter i get correct values. i have been trying the approach used in > example in the previous post and that example makes sense but it doesnt > make sense when i use it in my case. I am right now assigning it to a > variable, i am now thinking of exploring the possibility of saving data to > a dot net array maybe using System.Array and saving data to it but not sure > if that even make sense. > > > > Sent from my iPhone > > _________________________________________________ > Python.NET mailing list - PythonDotNet@python.org > https://mail.python.org/mailman/listinfo/pythondotnet >
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
If you know that they are always numpy arrays, then it doesn't really help. However, if they could be numpy arrays, lists, strings, ... then using buffer() on them gets you the memory buffer of the object, basically an array of bytes you could copy. Basically it helps generalize the problem to more types of objects. Jeff On Wed, Nov 19, 2014 at 4:29 PM, Denis Akhiyarov <denis.akhiyarov@gmail.com> wrote:
Jeff,
Thank you for quick reply. Can you give any example of how would buffer() help converting numpy/python arrays to managed?
For now I developed as decorator (on my personal time) that handles all I/O conversion on python side (assumes one input and one output arbitrary array):
def decornet(func,T=System.Object): def inner(*args,**kwargs): res = np.array(func(listit(args[0]),*args[1:],**kwargs)) tnum = res.dtype.type if tnum is np.int32: tnet = System.Int32 elif tnum is np.int64: tnet = System.Int64 elif tnum is np.float: tnet = System.Single elif tnum is np.double: tnet = System.Double elif tnum is np.bool: tnet = System.Boolean else: tnet = T netarr = Array.CreateInstance(tnet,*res.shape) it = np.nditer(res,flags=['multi_index']) while not it.finished: ix = it.multi_index if len(ix)==1: netarr[ix[0]] = res[ix[0]] else: netarr[ix] = res[ix] it.iternext() return netarr return inner
Thanks,
Denis
On Wed, Nov 5, 2014 at 4:32 PM, Jeffrey Bush <jeff@coderforlife.com> wrote:
To copy from a list or tuple you need to use the Python buffer() function. You can use that on numpy arrays as well. In buffer form, the len() is the byte length so then you don't need to know the data type or size. However, some additional work would be required if you wanted to make sure the C# array was of the proper type and length.
Jeff
On Wed, Nov 5, 2014 at 11:28 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
Finally decided to generate the managed object on Python side and return it to C# with no conversion necessary in C#. This way I can even wrap regular function with Python with @decorator to handle the conversion. I suppose the dynamic version of pythonnet may have auto conversion for Python 3, but I have not tried. I'm on Python 2.7.
On Wed, Nov 5, 2014 at 9:31 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
And more important question - is it possible to generalize the copying of python array object to managed C# array object without knowing the data type/size/length?
On Wed, Nov 5, 2014 at 8:58 AM, Denis Akhiyarov < denis.akhiyarov@gmail.com> wrote:
How to copy unmanaged array (python list/tuple or numpy array) into managed C# array? I guess using Marshal.Copy, but can anyone point to example?
Thanks, Denis
On Thu, Oct 30, 2014 at 12:19 PM, Nikhil Garg < nikhilgarg.gju@gmail.com> wrote:
Thanks Brad and Jeff for the detailed info. For now, fromiter is serving me well and has reduced my processing time considerably, so I am just going to stick with it.
On 29 October 2014 11:04, Jeffrey Bush <jeff@coderforlife.com> wrote:
> I finally have a chance to chime in, and Bradley is exactly right. > Marshall.Copy copies the raw data, and apparently your file library does > not store that data in a nice, contiguous, manner. While it is highly > likely that copying all the data to an array in C# will be faster than the > fromiter in Python, I am unsure if copying all the data to an array in C# > then copying all the data again to a numpy array will be faster than > fromiter (cause you have to copy it twice). The exception is if the file > library has a function like ToArray that is optimized to copy the data to a > linear chunk of data. So, what type is "Data"? > > Another factor is how long the chunk of data you are copying is. You > say the last axis is only 400 elements long. Check out my code and you will > see that at 400 elements long, fromiter is actually the fastest (at least > when I tried). An example run: > > Copy using for loop in 0.000884 sec > Copy using fromiter in 0.000144 sec # fastest > Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower > than fromiter > Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than > fromiter > > I start to do better with Marshal.Copy then fromiter around 5000 > elements copied. This is because the overhead of the mass copies is high > but adding each element doesn't take much time. fromstring has a lower > overhead but slightly longer per-element time (fromstring is better than > Marshal.Copy until ~200,000 elements). > > So you might be doing as good as you can possibly do. If I knew more > about your file format library I might be able to provide more insight. > > Jeff > > On Tue, Oct 28, 2014 at 2:45 PM, Bradley Friedman <brad@fie.us> > wrote: > >> Well it makes sense to me that doing it via an iterator, and >> element at a time, would be slow. There’s a lot of call overhead >> associated with each iteration step. Whether it’s done in .net, or in >> python, or a call from one to the other, it will be slow. It’s still a >> call where you’d be better off copying whole buffers. >> >> Ideally you’d pull the data into as simple and raw a data structure >> as you can on the dotnet side, in a buffered manner. Then you’d execute a >> movement of the data across, a reasonably sized chunk of buffer at a time. >> This will reduce call overhead and also allow read-ahead caching to do its >> thing on the file-access side of things. >> >> Your suggestion of loading into a .net array and then moving that >> array over, makes sense. But I think it comes down to what you can do with >> the third party file-format library. If its not going to provide you with >> the data as some kind of buffer with a cohesive and known format in memory, >> you’re not really going to be able to move it over without iterating over >> it and reformatting it at some point. >> >> Specifically, I’d point to Jeffery’s original caveat: >> >> "but does involve a number of assumptions (for example that the >> data in the two arrays are laid out in the same way)." >> >> The question is: is there a way to get the data off of disk and in >> memory from dotnet library, where its layout in memory is known, and >> something you want exactly as it is, but in python? If so, you should be >> able to use the methods from the afore linked thread. If not, you’re >> probably stuck iterating somewhere to reformat it, no matter what. Which >> is probably why you got garbage back. I’m guessing the object returned >> from the dotnet file-format-library isn’t laid out right, as suggested in >> the afore referenced caveat. >> >> >> > On Oct 28, 2014, at 9:55 AM, Nikhil <nikhilgarg.gju@gmail.com> >> wrote: >> > >> > Hello, >> > Yeah, I read data from a file say at each node and each time >> step, but when i try to use Marshal approach i get gibberish but when i use >> simple iter i get correct values. i have been trying the approach used in >> example in the previous post and that example makes sense but it doesnt >> make sense when i use it in my case. I am right now assigning it to a >> variable, i am now thinking of exploring the possibility of saving data to >> a dot net array maybe using System.Array and saving data to it but not sure >> if that even make sense. >> > >> > Sent from my iPhone >> >> _________________________________________________ >> Python.NET mailing list - PythonDotNet@python.org >> https://mail.python.org/mailman/listinfo/pythondotnet >> > > > _________________________________________________ > Python.NET mailing list - PythonDotNet@python.org > https://mail.python.org/mailman/listinfo/pythondotnet >
-- Regards
Nikhil
------------------------------------------------------------------- Big whirls have little whirls, Which feed on their velocity, And little whirls have lesser whirls, And so on to viscosity (Richardson, 1922)
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
_________________________________________________ Python.NET mailing list - PythonDotNet@python.org https://mail.python.org/mailman/listinfo/pythondotnet
participants (5)
-
Bradley Friedman
-
Denis Akhiyarov
-
Jeffrey Bush
-
Nikhil
-
Nikhil Garg