How to improve the efficiency of simulating the disordered electronic system?

Dear developers, dear colleagues, Kwant is a really excellent package and I appreciate what you have done to make the numerical simulation easier. I have been using Kwant for a while. Recently I am simulating the disordered quantum anomalous Hall insulators. The disorder average is time-consuming. In the simulation the Fermi energy of the central scattering region is fixed, and I know that extracting the conductance involves a step of computing modes in the leads from https://www.mail-archive.com/kwant-discuss@kwant-project.org/msg00718.html, thus it came to me that whether I can solve the modes in the leads and store them before, and then I just invoke instead of solving them for every disorder configuration. I have tried to realized a MatLab implementation of Landauer-Buttiker formula based on the recursive Green's function, in which I solved the self energy and line width function of leads for once and performing the disorder average with them invoked and it works fine. I know that the implementation in Kwant is some kind of different, and I was wondering if there was something optimized for simulating disordered system. Below is my code: import kwant import numpy as np import time as ti from mpi4py import MPI comm = MPI.COMM_WORLD k = comm.Get_rank() nprocessor = comm.size import warnings warnings.filterwarnings('ignore') # Pauli matrices s0 = np.eye(2) sx = np.array([[0,1], [1,0]]) sy = np.array([[0, -1j],[1j, 0]]) sz = np.array([[1,0],[0,-1]]) # Parameters A = 364.5 # meV nm B = -686 # meV nm^2 C = 0 # meV D = -512 # meV nm^2 M = -10 # meV L = 256 # Building the system def qwz(a=1, L=L, W=L): hamiltonian_syst = """ (V(x, y) + C-D*(k_x**2+k_y**2))* identity(2) + (M-B*(k_x**2+k_y**2)) * sigma_z + A*k_x*sigma_x + A*k_y*sigma_y """ hamiltonian_lead = """ (C-D*(k_x**2+k_y**2))* identity(2) + (M-B*(k_x**2+k_y**2)) * sigma_z + A*k_x*sigma_x + A*k_y*sigma_y """ template_syst = kwant.continuum.discretize(hamiltonian_syst, grid=a) template_lead = kwant.continuum.discretize(hamiltonian_lead, grid=a) def shape(site): (x, y) = site.pos return (0 <= y < W and 0 <= x < L) def lead_shape(site): (x, y) = site.pos return (0 <= y < W) syst = kwant.Builder() syst.fill(template_syst, shape, (0, 0)) lead = kwant.Builder(kwant.TranslationalSymmetry([-a, 0])) lead.fill(template_lead, lead_shape, (0, 0)) syst.attach_lead(lead) syst.attach_lead(lead.reversed()) # kwant.plot(syst) syst = syst.finalized() return syst def get_conductance(w, Ef, Nc): # w is the disorder strength, Ef is the Fermi energy, and Nc is the cycling times def disorder(x, y): # onsite disorder return np.random.uniform(-w/2, w/2) params = dict(A = A, B = B, C = C, D = D, M = M, Ef = Ef, V = disorder) syst = qwz() data = [] for i in range(Nc): smatrix = kwant.smatrix(syst, Ef, params = params) data.append(smatrix.transmission(1, 0)) G_av = np.mean(data) G_std = np.std(data) return G_av, G_std def main(): t1 = ti.time() syst = qwz() Ef = 100 # meV Nc = 500 # cycle w = k * 20 # meV G_av, G_std = get_conductance(w, Ef, Nc) # pyplot.plot(energies, data) # pyplot.xlabel("energy [t]") # pyplot.ylabel("conductance [e^2/h]") # pyplot.show() # pyplot.savefig('conductance' +str(k)+ '.png') np.savez('cond('+str(L)+')'+str(k)+ '.npz', x = w, y = G_av, ye = G_std) # print('Disorder strength:', w, 'Average conductance:', G_av, 'Conductance fluctuation:', G_std) t2 = ti.time() print('time cost:', t2-t1, 's') if __name__ == '__main__': main() In which I simulated a system of 256*256 atoms and averaged for 500 samples. It will take about 20000 seconds with a dual E5 2696 v3 server with 50 threads working parallely at about 2.7 GHz. I know that part of the developers contributed to the termed Topological Anderson Insulator, and I was wondering if there's some way to improve the efficiency of my code? Thank you for taking your time! Yours sincerely, Shihao, A PhD candidate in cond-mat theory

Shihao wrote:
In which I simulated a system of 256*256 atoms and averaged for 500 samples. It will take about 20000 seconds with a dual E5 2696 v3 server with 50 threads working parallely at about 2.7 GHz. I know that part of the developers contributed to the termed Topological Anderson Insulator, and I was wondering if there's some way to improve the efficiency of my code?
Hi, happy to hear that Kwant is useful for you. I only looked briefly at your script, but it seems to me that each MPI rank is computing 500 S-matrices. When you say that you are using 50 threads in parallel, do you mean that you launched a single MPI job of size 50 (with 50 processes)? Or do you refer to multi-threaded BLAS/LAPACK computations? Averaging over disorder realizations is generally a task that lends itself very well to parallelization: it’s a so-called embarrassingly parallel workload. If you need to compute 500 realizations on a machine with 50 cores, that means that each core has to compute 10 realizations. That should not take 5 1/2 hours (=20000 seconds). Here are some points that might help you: • Make sure that you do not oversubscribe the machine. For example, by default OpenBLAS will utilize all the available cores, so if N processes are launched on a N-core machine, N*N threads will execute in total, which is very bad for performance. Check for oversubscription by monitoring the system load. OpenBLAS can be forced to use a single thread only by setting the OPENBLAS_NUM_THREADS environment variable. • Avoid recalculating the modes when only the disorder changes. Kwant provides a way to precalculate modes: https://kwant-project.org/doc/1/reference/generated/kwant.system.FiniteSyste... Hope this helps Christoph
participants (2)
-
bishihao@stu.scu.edu.cn
-
Christoph Groth