Runnning Kwant computations on parallel machines
Dear Kwant Community, My name is Adrian Nosek, I am a graduate student at the University of California Riverside in Condensed Matter Physics, and I was wondering whether there are any plans to extend Kwant and let it run on parallel machines, so that systems of bigger sample size could be modeled? I attempted to calculate the conductivity of a sample with 200000 atoms (I believe that's what it was but its been a while since I ran the simulation) and this is where my machine ran out of RAM. I really like the idea of simulating my experiment using Kwant, but is there any advantage to model let's say a 4 by 20 micron size system using Kwant vs a classical approach? I believe the mean free paths of electrons in graphene can easily reach micronlong pathways in high mobility graphenehBN sandwiches, so that is the reason why I am asking. Also, would it make any sense to implement such a simulation in the context of Big Data Analysis, where analysis is performed on parallel working machines where one could diagonalize such a big Hamiltonian, e.g. using Apache Spark? Thanks for your answers, Adrian
Adrian Nosek wrote:
My name is Adrian Nosek, I am a graduate student at the University of California Riverside in Condensed Matter Physics, and I was wondering whether there are any plans to extend Kwant and let it run on parallel machines, so that systems of bigger sample size could be modeled?
Hi Adrian, thanks for your interest. Parallel computations are usually subdivided into two broad classes that can be combined: • Task parallelism: Spread independent calculations (e.g. at different energies, or for different disorder realizations) across different cores/machines. • Data parallelism: Spread a single computation. For example, each core/machine could take care of part of a huge Hamiltonian matrix. There are many variants of both of course. Each can be done across a single machine with many cores, across an interconnected cluster of machines, or across machines in a "grid" or the "cloud". Task parallelism can be done with Kwant in various ways. One way that we recommend (concurrent.futures) is illustrated with an complete example (the spin valve) in Kwant paper. Basic data parallelism can be achieved by compiling Kwant with a parallel BLAS/LAPACK library. The default Debian/Ubuntu packages of Kwant, for example, are built with the parallel OpenBLAS. (Unfortunately, with OpenBLAS the speedup seems to be negligible.) But perhaps it’s a matter of tuning OpenBLAS properly. It’s also possible that another BLAS (e.g. intel MKL) would work better. There is so far no Kwant solver that utilizes data parallelism other than through BLAS, but it should be not too difficult to adapt the (currently sequential) MUMPS solver to use MUMPS with MPI. There have been some discussions about this on the kwantdevel list: https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00047.html https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00052.html We would be of course happy to support anyone who would like to work on this!
I attempted to calculate the conductivity of a sample with 200000 atoms (I believe that's what it was but its been a while since I ran the simulation) and this is where my machine ran out of RAM.
How many orbitals per site does your model feature? How much RAM does your machine have? On my laptop with 8 GB of RAM, I’ve done calculations with up to 2 millions of singleorbital sites (see the benchmark in the Kwant paper). But it’s true that the MUMPS solver that Kwant uses by default (and which is the fastest) needs a lot of RAM.
I really like the idea of simulating my experiment using Kwant, but is there any advantage to model let's say a 4 by 20 micron size system using Kwant vs a classical approach? I believe the mean free paths of electrons in graphene can easily reach micronlong pathways in high mobility graphenehBN sandwiches, so that is the reason why I am asking.
Coherent simulations (like the ones done by Kwant) make sense when the coherence length is comparable to the system size. Whether it makes sense for your particular experiment I don’t know.
Also, would it make any sense to implement such a simulation in the context of Big Data Analysis, where analysis is performed on parallel working machines where one could diagonalize such a big Hamiltonian, e.g. using Apache Spark?
Can Apache Spark be used for solving huge systems of linear equations? Because this is what Kwant is doing internally. My impression is that "big data" frameworks are rather meant for problems that require less communication. But there are certainly libraries for solving linear systems in parallel, see the threads linked above.
Hi Adrian, I'd like also address the specific aim that you have right now in addition to Christoph's reply about parallelism. The common approach to simulating mesoscopic transport in large devices is to use an effective tightbinding model instead of physical atoms. That way you can efficiently simulate large devices still on your laptop. While the approach is well known and broadly used since the beginning of graphene days, there were two recent publications dedicated specifically to making such a simulation match an experiment in graphene devices [1]. I believe that approach is the optimal way for you to proceed. Best, Anton On Wed, Aug 31, 2016 at 10:50 AM, Christoph Groth <christoph.groth@cea.fr> wrote:
Adrian Nosek wrote:
My name is Adrian Nosek, I am a graduate student at the University of California Riverside in Condensed Matter Physics, and I was wondering whether there are any plans to extend Kwant and let it run on parallel machines, so that systems of bigger sample size could be modeled?
Hi Adrian, thanks for your interest. Parallel computations are usually subdivided into two broad classes that can be combined:
• Task parallelism: Spread independent calculations (e.g. at different energies, or for different disorder realizations) across different cores/machines.
• Data parallelism: Spread a single computation. For example, each core/machine could take care of part of a huge Hamiltonian matrix.
There are many variants of both of course. Each can be done across a single machine with many cores, across an interconnected cluster of machines, or across machines in a "grid" or the "cloud".
Task parallelism can be done with Kwant in various ways. One way that we recommend (concurrent.futures) is illustrated with an complete example (the spin valve) in Kwant paper.
Basic data parallelism can be achieved by compiling Kwant with a parallel BLAS/LAPACK library. The default Debian/Ubuntu packages of Kwant, for example, are built with the parallel OpenBLAS. (Unfortunately, with OpenBLAS the speedup seems to be negligible.) But perhaps it’s a matter of tuning OpenBLAS properly. It’s also possible that another BLAS (e.g. intel MKL) would work better.
There is so far no Kwant solver that utilizes data parallelism other than through BLAS, but it should be not too difficult to adapt the (currently sequential) MUMPS solver to use MUMPS with MPI. There have been some discussions about this on the kwantdevel list:
https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00047.html https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00052.html
We would be of course happy to support anyone who would like to work on this!
I attempted to calculate the conductivity of a sample with 200000 atoms (I believe that's what it was but its been a while since I ran the simulation) and this is where my machine ran out of RAM.
How many orbitals per site does your model feature? How much RAM does your machine have? On my laptop with 8 GB of RAM, I’ve done calculations with up to 2 millions of singleorbital sites (see the benchmark in the Kwant paper).
But it’s true that the MUMPS solver that Kwant uses by default (and which is the fastest) needs a lot of RAM.
I really like the idea of simulating my experiment using Kwant, but is there any advantage to model let's say a 4 by 20 micron size system using Kwant vs a classical approach? I believe the mean free paths of electrons in graphene can easily reach micronlong pathways in high mobility graphenehBN sandwiches, so that is the reason why I am asking.
Coherent simulations (like the ones done by Kwant) make sense when the coherence length is comparable to the system size. Whether it makes sense for your particular experiment I don’t know.
Also, would it make any sense to implement such a simulation in the context of Big Data Analysis, where analysis is performed on parallel working machines where one could diagonalize such a big Hamiltonian, e.g. using Apache Spark?
Can Apache Spark be used for solving huge systems of linear equations? Because this is what Kwant is doing internally. My impression is that "big data" frameworks are rather meant for problems that require less communication.
But there are certainly libraries for solving linear systems in parallel, see the threads linked above.
The common approach to simulating mesoscopic transport in large devices is to use an effective tightbinding model instead of physical atoms. That way you can efficiently simulate large devices still on your laptop. While the approach is well known and broadly used since the beginning of graphene days, there were two recent publications dedicated specifically to making such a simulation match an experiment in graphene devices [1]. I believe that approach is the optimal way for you to proceed.
Sorry, forgot to add the references; here they go: http://arxiv.org/abs/1605.06924 and http://arxiv.org/abs/1407.5620
Best, Anton
On Wed, Aug 31, 2016 at 10:50 AM, Christoph Groth <christoph.groth@cea.fr> wrote:
Adrian Nosek wrote:
My name is Adrian Nosek, I am a graduate student at the University of California Riverside in Condensed Matter Physics, and I was wondering whether there are any plans to extend Kwant and let it run on parallel machines, so that systems of bigger sample size could be modeled?
Hi Adrian, thanks for your interest. Parallel computations are usually subdivided into two broad classes that can be combined:
• Task parallelism: Spread independent calculations (e.g. at different energies, or for different disorder realizations) across different cores/machines.
• Data parallelism: Spread a single computation. For example, each core/machine could take care of part of a huge Hamiltonian matrix.
There are many variants of both of course. Each can be done across a single machine with many cores, across an interconnected cluster of machines, or across machines in a "grid" or the "cloud".
Task parallelism can be done with Kwant in various ways. One way that we recommend (concurrent.futures) is illustrated with an complete example (the spin valve) in Kwant paper.
Basic data parallelism can be achieved by compiling Kwant with a parallel BLAS/LAPACK library. The default Debian/Ubuntu packages of Kwant, for example, are built with the parallel OpenBLAS. (Unfortunately, with OpenBLAS the speedup seems to be negligible.) But perhaps it’s a matter of tuning OpenBLAS properly. It’s also possible that another BLAS (e.g. intel MKL) would work better.
There is so far no Kwant solver that utilizes data parallelism other than through BLAS, but it should be not too difficult to adapt the (currently sequential) MUMPS solver to use MUMPS with MPI. There have been some discussions about this on the kwantdevel list:
https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00047.html https://www.mailarchive.com/kwantdevel%40kwantproject.org/msg00052.html
We would be of course happy to support anyone who would like to work on this!
I attempted to calculate the conductivity of a sample with 200000 atoms (I believe that's what it was but its been a while since I ran the simulation) and this is where my machine ran out of RAM.
How many orbitals per site does your model feature? How much RAM does your machine have? On my laptop with 8 GB of RAM, I’ve done calculations with up to 2 millions of singleorbital sites (see the benchmark in the Kwant paper).
But it’s true that the MUMPS solver that Kwant uses by default (and which is the fastest) needs a lot of RAM.
I really like the idea of simulating my experiment using Kwant, but is there any advantage to model let's say a 4 by 20 micron size system using Kwant vs a classical approach? I believe the mean free paths of electrons in graphene can easily reach micronlong pathways in high mobility graphenehBN sandwiches, so that is the reason why I am asking.
Coherent simulations (like the ones done by Kwant) make sense when the coherence length is comparable to the system size. Whether it makes sense for your particular experiment I don’t know.
Also, would it make any sense to implement such a simulation in the context of Big Data Analysis, where analysis is performed on parallel working machines where one could diagonalize such a big Hamiltonian, e.g. using Apache Spark?
Can Apache Spark be used for solving huge systems of linear equations? Because this is what Kwant is doing internally. My impression is that "big data" frameworks are rather meant for problems that require less communication.
But there are certainly libraries for solving linear systems in parallel, see the threads linked above.
participants (3)

Adrian Nosek

Anton Akhmerov

Christoph Groth