diegobruno244@gmail.com wrote:
Kwant Community, I would like to know if it is possible to do parallel computing using kwant? To do parallel computing can be used numba, CUDA, multiprocessing, Parallel. This work for kwant?
If you search the list for “parallel”, you will find various discussions about the topic. In the following, I summarize the current situation. If your computing workload consists of independent bits (like averaging an observable over different disorder realizations), then it is best to parallelize on this level. Depending on your hardware and your preferences you can then use concurrent.futures, mpi4py or any other parallelization framework. If you go this way, be sure to disable OpenBLAS parallelization [1]. If your workload cannot be split up in the above way, we have to differentiate between the three phases of system construction, Hamiltonian evaluation and solving. • The system construction phase (i.e. using kwant.Builder) is inherently sequential due to the way CPython works internally (search for GIL, the infamous “Global Interpreter Lock”). Luckily, for large systems, the construction phase is seldom the bottleneck since the time it takes is O(N) where N is the number of sites. If it is problematic in your case, you could bypass the builder by creating a custom low-level system. One day we might provide a much faster builder (successor). • The Hamiltonian evaluation phase is what happens inside the method ‘hamiltonian_submatrix’. This phase runs in Python and is sequential as well. It involves evaluating all the value functions (wherever these are used) and currently this happens site by site and hopping by hopping. Speeding this up is something we have been working on for a long time and the next Kwant release should provide a way to vectorize Hamiltonian evaluation. The process will still be sequential, but should get a very significant speed up. Still, even today, for large systems Hamiltonian evaluation is typically not the bottle neck since it as well takes time O(N). • For large systems it is typically the solving phase that dominates total running time. This is to be expected since it takes a time that is polynomial in the size of system. (The exponent depends on system dimensionality and the used solver.) Very often the solving phase will involve Kwant calling into a linear algebra library that ultimately calls some sort of BLAS. The default setup on the platforms I know best (Debian-like) is such that OpenBLAS is used and it is parallelized using OpenMP so that all available cores are used. This is not a very useful parallelization, but it is better than nothing. See [1]. The solvers use different libraries that support parallelization to a different degree. No one has seriously worked on this because vectorization needs to be done first (this is being worked on), and quite often people have the opportunity to parallelize on the level of independent tasks. For very large systems solving (and everything else) is likely dominated by the MUMPS library which is inherently parallel and which Kwant uses currently only in a sequential way. This is something that could be most likely done with little effort currently, but no one so far has finished that work. There exists an open issue: [2]. I do not know how well the MUMPS parallelization works for the kind of linear systems that Kwant creates. Hope that the above provides some insight into the state of things. Christoph [1] https://mail.python.org/archives/list/kwant-discuss@python.org/thread/SSU7PF... [2] https://gitlab.kwant-project.org/kwant/kwant/-/issues/54