Hi Yingchao,
I don't think covering grid construction is parallelized at the moment - in your script each processor is creating its own copy of the covering grid. I haven't tried experimenting with this though so it's possible I'm misreading the code or missing something happening elsewhere to make this operation parallel.
One way around this with the current version of yt would be to handle the parallelism yourself - i.e. have each processor construct a subvolume of the covering grid and then combine it yourself using e.g. mpi4py.
I think it might also be possible to make yt construct the covering grid in a parallel-aware fashion. The relevant code is here:
One would need to make the loop over io chunks a parallel loop and then add a reduction step at the end.
If you don't feel like taking on this task yourself please feel free to open an issue about it on github so that we don't lose track of the feature request.
-Nathan