I finally have a chance to chime in, and Bradley is exactly right. Marshall.Copy copies the raw data, and apparently your file library does not store that data in a nice, contiguous, manner. While it is highly likely that copying all the data to an array in C# will be faster than the fromiter in Python, I am unsure if copying all the data to an array in C# then copying all the data again to a numpy array will be faster than fromiter (cause you have to copy it twice). The exception is if the file library has a function like ToArray that is optimized to copy the data to a linear chunk of data. So, what type is "Data"?
Another factor is how long the chunk of data you are copying is. You say the last axis is only 400 elements long. Check out my code and you will see that at 400 elements long, fromiter is actually the fastest (at least when I tried). An example run:
Copy using for loop in 0.000884 sec
Copy using fromiter in 0.000144 sec # fastest
Copy using fromstring in 0.001460 sec # fairly slow, 10.3x slower than fromiter
Copy using Marshal.Copy in 0.001680 sec # slowest, 11.7x slower than fromiter
I start to do better with Marshal.Copy then fromiter around 5000 elements copied. This is because the overhead of the mass copies is high but adding each element doesn't take much time. fromstring has a lower overhead but slightly longer per-element time (fromstring is better than Marshal.Copy until ~200,000 elements).
So you might be doing as good as you can possibly do. If I knew more about your file format library I might be able to provide more insight.
Jeff