Hello Christoph,

Thank you for your detailed answer.

I convinced myself with a small C++ benchmark code that summing over the complex values of a vector<vector<complex<double>>> array is more than twice slower than summing over a flat, same size, dynamically allocated array (I attached the code to this mail). About the memory usage, I don't understand why it would take signifcantly more space, but maybe because I first suggested an array of Python objects instead of compiled objects, which take more space ?

I agree and I am interested by coding the solution you are suggesting, I would like to suggest further developpement to it: instead of returning a si! mple list of 2D arrays without copying anything, we use an "accesser" object that would enable matrix products :

It gets initialised with the system "syst"
Calling "accesser.wavefunction(bra, site, orbital)" (or some similar writing) returns "wavefunction[orbital + offset(site)]" with the right offset.
The matricial inner product is implemented: "accesser.wavefunction(bra, site) * accesser.hamiltonian(site) * accesser.wavefunction(ket, site)" would be a valid expression (or something similar, with the same idea to have an apparent matrix-vector or matrix-martix product).

The third bullet point raises for me some questions though:

What do you think of it ?
I need to understand how the hamiltionian is represented to be able to think of possibilities to implement that, but I can't grasp what is the low level representation of the hamiltonian when the system is finalised, is it a fla! t array too ? When I look at "_eval_hamiltonian()" in the class "_LocalOperator". I see that it uses the return value of "syst.hamiltonian()", then converts it into a Python tinyarray.matrix object, then convert it once again into a BlockSpareMatrix.

Thank you,

Adel

On Fri, 10 May, 2019 at 6:15 PM, Christoph Groth <christoph.groth@cea.fr> wrote:

Hi Adel, Your understanding of the way wave functions (and other observables) are stored is correct. You are also right that working with low-level observables is inconvenient and could be improved. I'll explain how the current situation came about, and offer a possible solution. As always, we're glad to hear comments or suggestions. I still think that our original design choice to store everything in one flat array was a good one. The alternatives you mention would require much more RAM and would be also significantly slower due to decreased memory locality. If we were to store a list of tinyarrays that would require storing one Python object (the tinyarray) per site. For example, if we were to do this for a system with two orbitals per site, that would mean a memory cost of 8 + 56 = 64 bytes per site. With the current approach, we require two complex numbers, that is 16 bytes per site. What's more, accessing this memory is no longer sequential, which makes it even slower. (You could object that finalized builders already store O(N) Python objects, namely the sites and their tags. But (1) that is not true for general Kwant low-level systems, and (2) we are working on solving this problem for finalized builders as well.) The other solutions that you mention have similar problems. What we could have done is sort the sites according to their number of orbitals, and then for each block of sites store a 2d array of shape (num_sites, num_orbs). But back when the low-level format was established, the number of orbitals of a site was not fixed, and so this could not have been done. The reason why the number was not fixed was not so much a false sense of generality, but rather that we didn't see a good way to specify the number of orbitals per site. We only later had the idea to fix the number of orbitals per site family. (The above solution would have the inconvenience that the wave function would be no longer a simple array, but rather a list of 2d arrays. This exposes more structure, but it also makes things like normalizing a wave function more difficult.) Here is how you can efficiently realize the above solution with a couple of lines of code: Check out the attribute 'site_ranges' of Kwant systems [1]. With this information, you can write a function 'unpack' that takes a system and a numpy array with one entry per orbital (for example a wave function), and returns a list of 2d arrays, that, without copying, refer to the data in a way that simplifies working with sites. If this sounds like a good idea and you write this function, be sure to share it. We might want to include it into Kwant as a method of 'System'. Cheers Christoph [1] https://kwant-project.org/doc/1/reference/generated/kwant.system.System#kwant.system.System