The mu.py script will keep running and never end.

Hi, My environment is Ubuntu 20.04 and python 3.8.3 managed by pyenv. I try to run the script <https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/...>, but it will keep running and never end. When I use 'Ctrl + c' to terminate it, it will give the following output: $ python mu.py [-10.999 -10.999 -10.999 ... 20. 20. 20. ] [4.973e-84 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] I have to terminate it and obtained the following information: ^CTraceback (most recent call last): File "mu.py", line 38, in <module> integrand=DOS*fermi_array(energy,mu,kT) File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2108, in __call__ return self._vectorize_call(func=func, args=vargs) File "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2192, in _vectorize_call outputs = ufunc(*inputs) File "mu.py", line 8, in fermi return 1./(exp((E-mu)/kT)+1) KeyboardInterrupt Any helps and hints for this problem will be highly appreciated? Regards, -- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <robert.kern@gmail.com> wrote:
You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
Yes, it really does the trick. See the following for the benchmark based on your suggestion: $ time python mu.py [-10.999 -10.999 -10.999 ... 20. 20. 20. ] [4.973e-84 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] real 0m41.056s user 0m43.970s sys 0m3.813s But are there any ways to further improve/increase efficiency? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that. The timings of your approach is highly dependent on the size of your “energy” and “DOS” array - not to mention calling trapz 6000 times in a loop. Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...

On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <andrea.gavana@gmail.com> wrote:
The size of the “energy” and “DOS” array is Problem-related and shouldn't be reduced arbitrarily.
not to mention calling trapz 6000 times in a loop.
I'm currently thinking on parallelization the execution of the for loop, say, with joblib <https://github.com/joblib/joblib>, but I still haven't figured out the corresponding codes. If you have some experience on this type of solution, could you please give me some more hints?
-- Hongyi Zhao <hongyi.zhao@gmail.com>

The script seems to be computing the particle numbers for an array of chemical potentials. Two ways of speeding it up, both are likely simpler then using dask: First: use numpy 1. Move constructing mu_all out of the loop (np.linspace) 2. Arrange the integrands into a 2d array 3. np.trapz along an axis which corresponds to a single integrand array (Or avoid the overhead of trapz by just implementing the trapezoid formula manually) Second: Move the loop into cython. вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <hongyi.zhao@gmail.com>:

On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Roughly like this: https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I can't find the cython part suggested by you, i.e., move the loop into cython. Furthermore, I also learned that the numpy array is optimized and has the performance close to C/C++.
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I try to run this notebook, but find that all of the function/variable/method can't be found at all, if invoke them in separate cells. See here for more details: https://github.com/hongyi-zhao/test/blob/master/fermi_integrate_np.ipynb Any hints for this problem? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I've done the comparison of the real execution time for your version I've compared the execution efficiency of your above method and the original method of the python script by directly using fermi() without executing vectorize() on it. Very surprisingly, the latter is more efficient than the former, see following for more info: $ time python fermi_integrate_np.py [[1.03000000e+01 4.55561775e+17] [1.03001000e+01 4.55561780e+17] [1.03002000e+01 4.55561786e+17] ... [1.08997000e+01 1.33654085e+21] [1.08998000e+01 1.33818034e+21] [1.08999000e+01 1.33982054e+21]] real 1m8.797s user 0m47.204s sys 0m27.105s $ time python mu.py [[1.03000000e+01 4.55561775e+17] [1.03001000e+01 4.55561780e+17] [1.03002000e+01 4.55561786e+17] ... [1.08997000e+01 1.33654085e+21] [1.08998000e+01 1.33818034e+21] [1.08999000e+01 1.33982054e+21]] real 0m38.829s user 0m41.541s sys 0m3.399s So, I think that the benchmark dataset used by you for testing code efficiency is not so appropriate. What's your point of view on this testing results? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

Hi, On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code. In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py. I had much better results myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly because my knowledge of Cython is quite limited. That said, your problem should be fairly easy to implement in a compiled language. Andrea.

Hi, On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao <hongyi.zhao@gmail.com> wrote: > On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana <andrea.gavana@gmail.com> > wrote: > > > > Hi, > > > > On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <hongyi.zhao@gmail.com> wrote: > >> > >> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski > >> <evgeny.burovskiy@gmail.com> wrote: > >> > > >> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski > >> > <evgeny.burovskiy@gmail.com> wrote: > >> > > > >> > > The script seems to be computing the particle numbers for an array > of chemical potentials. > >> > > > >> > > Two ways of speeding it up, both are likely simpler then using dask: > >> > > > >> > > First: use numpy > >> > > > >> > > 1. Move constructing mu_all out of the loop (np.linspace) > >> > > 2. Arrange the integrands into a 2d array > >> > > 3. np.trapz along an axis which corresponds to a single integrand > array > >> > > (Or avoid the overhead of trapz by just implementing the trapezoid > formula manually) > >> > > >> > > >> > Roughly like this: > >> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99 > >> > >> I've done the comparison of the real execution time for your version > >> I've compared the execution efficiency of your above method and the > >> original method of the python script by directly using fermi() without > >> executing vectorize() on it. Very surprisingly, the latter is more > >> efficient than the former, see following for more info: > >> > >> $ time python fermi_integrate_np.py > >> [[1.03000000e+01 4.55561775e+17] > >> [1.03001000e+01 4.55561780e+17] > >> [1.03002000e+01 4.55561786e+17] > >> ... > >> [1.08997000e+01 1.33654085e+21] > >> [1.08998000e+01 1.33818034e+21] > >> [1.08999000e+01 1.33982054e+21]] > >> > >> real 1m8.797s > >> user 0m47.204s > >> sys 0m27.105s > >> $ time python mu.py > >> [[1.03000000e+01 4.55561775e+17] > >> [1.03001000e+01 4.55561780e+17] > >> [1.03002000e+01 4.55561786e+17] > >> ... > >> [1.08997000e+01 1.33654085e+21] > >> [1.08998000e+01 1.33818034e+21] > >> [1.08999000e+01 1.33982054e+21]] > >> > >> real 0m38.829s > >> user 0m41.541s > >> sys 0m3.399s > >> > >> So, I think that the benchmark dataset used by you for testing code > >> efficiency is not so appropriate. What's your point of view on this > >> testing results? > > > > > > > > Evgeni has provided an interesting example on how to speed up your > code - granted, he used toy data but the improvement is real. As far as I > can see, you haven't specified how big are your DOS etc... vectors, so it's > not that obvious how to draw any conclusions. I find it highly puzzling > that his implementation appears to be slower than your original code. > > > > In any case, if performance is so paramount for you, then I would > suggest you to move in the direction Evgeni was proposing, i.e. shifting > your implementation to C/Cython or Fortran/f2py. > > If so, I think that the C/Fortran based implementations should be more > efficient than the ones using Cython/f2py. That is not what I meant: what I meant is: write the time consuming part of your code in C or Fortran and then bridge it to Python using Cython or f2py. Andrea. > > > > I had much better results myself using Fortran/f2py than pure NumPy or > C/Cython, but this is mostly because my knowledge of Cython is quite > limited. That said, your problem should be fairly easy to implement in a > compiled language. > > > > Andrea. > > > > > >> > >> > >> Regards, > >> HY > >> > >> > > >> > > >> > > >> > > Second: > >> > > > >> > > Move the loop into cython. > >> > > > >> > > > >> > > > >> > > > >> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <hongyi.zhao@gmail.com>: > >> > >> > >> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> > > >> > >> > > >> > >> > > >> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <hongyi.zhao@gmail.com> > wrote: > >> > >> >> > >> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> >> >> > >> > >> >> >> Hi, > >> > >> >> >> > >> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao < > hongyi.zhao@gmail.com> wrote: > >> > >> >> >>> > >> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern < > robert.kern@gmail.com> wrote: > >> > >> >> >>> > > >> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() > will work just fine on arrays and should be much faster. > >> > >> >> >>> > >> > >> >> >>> Yes, it really does the trick. See the following for the > benchmark > >> > >> >> >>> based on your suggestion: > >> > >> >> >>> > >> > >> >> >>> $ time python mu.py > >> > >> >> >>> [-10.999 -10.999 -10.999 ... 20. 20. 20. ] > [4.973e-84 > >> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] > >> > >> >> >>> > >> > >> >> >>> real 0m41.056s > >> > >> >> >>> user 0m43.970s > >> > >> >> >>> sys 0m3.813s > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> But are there any ways to further improve/increase > efficiency? > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> I believe it will get a bit better if you don’t column_stack > an array 6000 times - maybe pre-allocate your output first? > >> > >> >> >> > >> > >> >> >> Andrea. > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in > front of your column_stack call and made me think you were stacking your > results very many times, which is not the case. > >> > >> >> > >> > >> >> Still not so clear on your solutions for this problem. Could you > >> > >> >> please post here the corresponding snippet of your enhancement? > >> > >> > > >> > >> > > >> > >> > I have no solution, I originally thought you were calling > “column_stack” 6000 times in the loop, but that is not the case, I was > mistaken. My apologies for that. > >> > >> > > >> > >> > The timings of your approach is highly dependent on the size of > your “energy” and “DOS” array - > >> > >> > >> > >> The size of the “energy” and “DOS” array is Problem-related and > >> > >> shouldn't be reduced arbitrarily. > >> > >> > >> > >> > not to mention calling trapz 6000 times in a loop. > >> > >> > >> > >> I'm currently thinking on parallelization the execution of the for > >> > >> loop, say, with joblib <https://github.com/joblib/joblib>, but I > still > >> > >> haven't figured out the corresponding codes. If you have some > >> > >> experience on this type of solution, could you please give me some > >> > >> more hints? > >> > >> > >> > >> > Maybe there’s a better way to do it with another approach, but > at the moment I can’t think of one... > >> > >> > > >> > >> >> > >> > >> >> > >> > >> >> Regards, > >> > >> >> HY > >> > >> >> > > >> > >> >> >> > >> > >> >> >> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> Regards, > >> > >> >> >>> HY > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao < > hongyi.zhao@gmail.com> wrote: > >> > >> >> >>> >> > >> > >> >> >>> >> Hi, > >> > >> >> >>> >> > >> > >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed > by pyenv. I > >> > >> >> >>> >> try to run the script > >> > >> >> >>> >> < > https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py > >, > >> > >> >> >>> >> but it will keep running and never end. When I use 'Ctrl > + c' to > >> > >> >> >>> >> terminate it, it will give the following output: > >> > >> >> >>> >> > >> > >> >> >>> >> $ python mu.py > >> > >> >> >>> >> [-10.999 -10.999 -10.999 ... 20. 20. 20. ] > [4.973e-84 > >> > >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] > >> > >> >> >>> >> > >> > >> >> >>> >> I have to terminate it and obtained the following > information: > >> > >> >> >>> >> > >> > >> >> >>> >> ^CTraceback (most recent call last): > >> > >> >> >>> >> File "mu.py", line 38, in <module> > >> > >> >> >>> >> integrand=DOS*fermi_array(energy,mu,kT) > >> > >> >> >>> >> File > "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", > >> > >> >> >>> >> line 2108, in __call__ > >> > >> >> >>> >> return self._vectorize_call(func=func, args=vargs) > >> > >> >> >>> >> File > "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", > >> > >> >> >>> >> line 2192, in _vectorize_call > >> > >> >> >>> >> outputs = ufunc(*inputs) > >> > >> >> >>> >> File "mu.py", line 8, in fermi > >> > >> >> >>> >> return 1./(exp((E-mu)/kT)+1) > >> > >> >> >>> >> KeyboardInterrupt > >> > >> >> >>> >> > >> > >> >> >>> >> > >> > >> >> >>> >> Any helps and hints for this problem will be highly > appreciated? > >> > >> >> >>> >> > >> > >> >> >>> >> Regards, > >> > >> >> >>> >> -- > >> > >> >> >>> >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> >>> >> _______________________________________________ > >> > >> >> >>> >> NumPy-Discussion mailing list > >> > >> >> >>> >> NumPy-Discussion@python.org > >> > >> >> >>> >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> >>> > > >> > >> >> >>> > _______________________________________________ > >> > >> >> >>> > NumPy-Discussion mailing list > >> > >> >> >>> > NumPy-Discussion@python.org > >> > >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> -- > >> > >> >> >>> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> >>> _______________________________________________ > >> > >> >> >>> NumPy-Discussion mailing list > >> > >> >> >>> NumPy-Discussion@python.org > >> > >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> > > >> > >> >> > _______________________________________________ > >> > >> >> > NumPy-Discussion mailing list > >> > >> >> > NumPy-Discussion@python.org > >> > >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> -- > >> > >> >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> _______________________________________________ > >> > >> >> NumPy-Discussion mailing list > >> > >> >> NumPy-Discussion@python.org > >> > >> >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > > >> > >> > _______________________________________________ > >> > >> > NumPy-Discussion mailing list > >> > >> > NumPy-Discussion@python.org > >> > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> _______________________________________________ > >> > >> NumPy-Discussion mailing list > >> > >> NumPy-Discussion@python.org > >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion@python.org > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> -- > >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Hongyi Zhao <hongyi.zhao@gmail.com> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >

On Mon, Oct 12, 2020 at 10:50 AM Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
If it's a small job, don't bother optimizing it further at all. We don't know how important this is for you. We can only make conditional recommendations ("if it's worth spending development effort to gain speed, here are some ways to do that"). Balancing the development effort with the utility of further performance gains is entirely up to you. -- Robert Kern

On Sun, Oct 11, 2020 at 2:56 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
The script seems to be computing the particle numbers for an array of chemical potentials.
Two ways of speeding it up, both are likely simpler then using dask:
What do you mean by saying *dask*?
Will this be more efficient than the schema like parallelization based on python modules, say, joblib?
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 1:48 AM Robert Kern <robert.kern@gmail.com> wrote:
You don't need to use vectorize() on fermi(). fermi() will work just fine on arrays and should be much faster.
Yes, it really does the trick. See the following for the benchmark based on your suggestion: $ time python mu.py [-10.999 -10.999 -10.999 ... 20. 20. 20. ] [4.973e-84 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] real 0m41.056s user 0m43.970s sys 0m3.813s But are there any ways to further improve/increase efficiency? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
I have no solution, I originally thought you were calling “column_stack” 6000 times in the loop, but that is not the case, I was mistaken. My apologies for that. The timings of your approach is highly dependent on the size of your “energy” and “DOS” array - not to mention calling trapz 6000 times in a loop. Maybe there’s a better way to do it with another approach, but at the moment I can’t think of one...

On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana <andrea.gavana@gmail.com> wrote:
The size of the “energy” and “DOS” array is Problem-related and shouldn't be reduced arbitrarily.
not to mention calling trapz 6000 times in a loop.
I'm currently thinking on parallelization the execution of the for loop, say, with joblib <https://github.com/joblib/joblib>, but I still haven't figured out the corresponding codes. If you have some experience on this type of solution, could you please give me some more hints?
-- Hongyi Zhao <hongyi.zhao@gmail.com>

The script seems to be computing the particle numbers for an array of chemical potentials. Two ways of speeding it up, both are likely simpler then using dask: First: use numpy 1. Move constructing mu_all out of the loop (np.linspace) 2. Arrange the integrands into a 2d array 3. np.trapz along an axis which corresponds to a single integrand array (Or avoid the overhead of trapz by just implementing the trapezoid formula manually) Second: Move the loop into cython. вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <hongyi.zhao@gmail.com>:

On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Roughly like this: https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I can't find the cython part suggested by you, i.e., move the loop into cython. Furthermore, I also learned that the numpy array is optimized and has the performance close to C/C++.
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I try to run this notebook, but find that all of the function/variable/method can't be found at all, if invoke them in separate cells. See here for more details: https://github.com/hongyi-zhao/test/blob/master/fermi_integrate_np.ipynb Any hints for this problem? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
I've done the comparison of the real execution time for your version I've compared the execution efficiency of your above method and the original method of the python script by directly using fermi() without executing vectorize() on it. Very surprisingly, the latter is more efficient than the former, see following for more info: $ time python fermi_integrate_np.py [[1.03000000e+01 4.55561775e+17] [1.03001000e+01 4.55561780e+17] [1.03002000e+01 4.55561786e+17] ... [1.08997000e+01 1.33654085e+21] [1.08998000e+01 1.33818034e+21] [1.08999000e+01 1.33982054e+21]] real 1m8.797s user 0m47.204s sys 0m27.105s $ time python mu.py [[1.03000000e+01 4.55561775e+17] [1.03001000e+01 4.55561780e+17] [1.03002000e+01 4.55561786e+17] ... [1.08997000e+01 1.33654085e+21] [1.08998000e+01 1.33818034e+21] [1.08999000e+01 1.33982054e+21]] real 0m38.829s user 0m41.541s sys 0m3.399s So, I think that the benchmark dataset used by you for testing code efficiency is not so appropriate. What's your point of view on this testing results? Regards, HY
-- Hongyi Zhao <hongyi.zhao@gmail.com>

Hi, On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
Evgeni has provided an interesting example on how to speed up your code - granted, he used toy data but the improvement is real. As far as I can see, you haven't specified how big are your DOS etc... vectors, so it's not that obvious how to draw any conclusions. I find it highly puzzling that his implementation appears to be slower than your original code. In any case, if performance is so paramount for you, then I would suggest you to move in the direction Evgeni was proposing, i.e. shifting your implementation to C/Cython or Fortran/f2py. I had much better results myself using Fortran/f2py than pure NumPy or C/Cython, but this is mostly because my knowledge of Cython is quite limited. That said, your problem should be fairly easy to implement in a compiled language. Andrea.

Hi, On Mon, 12 Oct 2020 at 16.22, Hongyi Zhao <hongyi.zhao@gmail.com> wrote: > On Mon, Oct 12, 2020 at 9:33 PM Andrea Gavana <andrea.gavana@gmail.com> > wrote: > > > > Hi, > > > > On Mon, 12 Oct 2020 at 14:38, Hongyi Zhao <hongyi.zhao@gmail.com> wrote: > >> > >> On Sun, Oct 11, 2020 at 3:42 PM Evgeni Burovski > >> <evgeny.burovskiy@gmail.com> wrote: > >> > > >> > On Sun, Oct 11, 2020 at 9:55 AM Evgeni Burovski > >> > <evgeny.burovskiy@gmail.com> wrote: > >> > > > >> > > The script seems to be computing the particle numbers for an array > of chemical potentials. > >> > > > >> > > Two ways of speeding it up, both are likely simpler then using dask: > >> > > > >> > > First: use numpy > >> > > > >> > > 1. Move constructing mu_all out of the loop (np.linspace) > >> > > 2. Arrange the integrands into a 2d array > >> > > 3. np.trapz along an axis which corresponds to a single integrand > array > >> > > (Or avoid the overhead of trapz by just implementing the trapezoid > formula manually) > >> > > >> > > >> > Roughly like this: > >> > https://gist.github.com/ev-br/0250e4eee461670cf489515ee427eb99 > >> > >> I've done the comparison of the real execution time for your version > >> I've compared the execution efficiency of your above method and the > >> original method of the python script by directly using fermi() without > >> executing vectorize() on it. Very surprisingly, the latter is more > >> efficient than the former, see following for more info: > >> > >> $ time python fermi_integrate_np.py > >> [[1.03000000e+01 4.55561775e+17] > >> [1.03001000e+01 4.55561780e+17] > >> [1.03002000e+01 4.55561786e+17] > >> ... > >> [1.08997000e+01 1.33654085e+21] > >> [1.08998000e+01 1.33818034e+21] > >> [1.08999000e+01 1.33982054e+21]] > >> > >> real 1m8.797s > >> user 0m47.204s > >> sys 0m27.105s > >> $ time python mu.py > >> [[1.03000000e+01 4.55561775e+17] > >> [1.03001000e+01 4.55561780e+17] > >> [1.03002000e+01 4.55561786e+17] > >> ... > >> [1.08997000e+01 1.33654085e+21] > >> [1.08998000e+01 1.33818034e+21] > >> [1.08999000e+01 1.33982054e+21]] > >> > >> real 0m38.829s > >> user 0m41.541s > >> sys 0m3.399s > >> > >> So, I think that the benchmark dataset used by you for testing code > >> efficiency is not so appropriate. What's your point of view on this > >> testing results? > > > > > > > > Evgeni has provided an interesting example on how to speed up your > code - granted, he used toy data but the improvement is real. As far as I > can see, you haven't specified how big are your DOS etc... vectors, so it's > not that obvious how to draw any conclusions. I find it highly puzzling > that his implementation appears to be slower than your original code. > > > > In any case, if performance is so paramount for you, then I would > suggest you to move in the direction Evgeni was proposing, i.e. shifting > your implementation to C/Cython or Fortran/f2py. > > If so, I think that the C/Fortran based implementations should be more > efficient than the ones using Cython/f2py. That is not what I meant: what I meant is: write the time consuming part of your code in C or Fortran and then bridge it to Python using Cython or f2py. Andrea. > > > > I had much better results myself using Fortran/f2py than pure NumPy or > C/Cython, but this is mostly because my knowledge of Cython is quite > limited. That said, your problem should be fairly easy to implement in a > compiled language. > > > > Andrea. > > > > > >> > >> > >> Regards, > >> HY > >> > >> > > >> > > >> > > >> > > Second: > >> > > > >> > > Move the loop into cython. > >> > > > >> > > > >> > > > >> > > > >> > > вс, 11 окт. 2020 г., 9:32 Hongyi Zhao <hongyi.zhao@gmail.com>: > >> > >> > >> > >> On Sun, Oct 11, 2020 at 2:02 PM Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> > > >> > >> > > >> > >> > > >> > >> > On Sun, 11 Oct 2020 at 07.52, Hongyi Zhao <hongyi.zhao@gmail.com> > wrote: > >> > >> >> > >> > >> >> On Sun, Oct 11, 2020 at 1:33 PM Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > On Sun, 11 Oct 2020 at 07.14, Andrea Gavana < > andrea.gavana@gmail.com> wrote: > >> > >> >> >> > >> > >> >> >> Hi, > >> > >> >> >> > >> > >> >> >> On Sun, 11 Oct 2020 at 00.27, Hongyi Zhao < > hongyi.zhao@gmail.com> wrote: > >> > >> >> >>> > >> > >> >> >>> On Sun, Oct 11, 2020 at 1:48 AM Robert Kern < > robert.kern@gmail.com> wrote: > >> > >> >> >>> > > >> > >> >> >>> > You don't need to use vectorize() on fermi(). fermi() > will work just fine on arrays and should be much faster. > >> > >> >> >>> > >> > >> >> >>> Yes, it really does the trick. See the following for the > benchmark > >> > >> >> >>> based on your suggestion: > >> > >> >> >>> > >> > >> >> >>> $ time python mu.py > >> > >> >> >>> [-10.999 -10.999 -10.999 ... 20. 20. 20. ] > [4.973e-84 > >> > >> >> >>> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] > >> > >> >> >>> > >> > >> >> >>> real 0m41.056s > >> > >> >> >>> user 0m43.970s > >> > >> >> >>> sys 0m3.813s > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> But are there any ways to further improve/increase > efficiency? > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> I believe it will get a bit better if you don’t column_stack > an array 6000 times - maybe pre-allocate your output first? > >> > >> >> >> > >> > >> >> >> Andrea. > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > I’m sorry, scratch that: I’ve seen a ghost white space in > front of your column_stack call and made me think you were stacking your > results very many times, which is not the case. > >> > >> >> > >> > >> >> Still not so clear on your solutions for this problem. Could you > >> > >> >> please post here the corresponding snippet of your enhancement? > >> > >> > > >> > >> > > >> > >> > I have no solution, I originally thought you were calling > “column_stack” 6000 times in the loop, but that is not the case, I was > mistaken. My apologies for that. > >> > >> > > >> > >> > The timings of your approach is highly dependent on the size of > your “energy” and “DOS” array - > >> > >> > >> > >> The size of the “energy” and “DOS” array is Problem-related and > >> > >> shouldn't be reduced arbitrarily. > >> > >> > >> > >> > not to mention calling trapz 6000 times in a loop. > >> > >> > >> > >> I'm currently thinking on parallelization the execution of the for > >> > >> loop, say, with joblib <https://github.com/joblib/joblib>, but I > still > >> > >> haven't figured out the corresponding codes. If you have some > >> > >> experience on this type of solution, could you please give me some > >> > >> more hints? > >> > >> > >> > >> > Maybe there’s a better way to do it with another approach, but > at the moment I can’t think of one... > >> > >> > > >> > >> >> > >> > >> >> > >> > >> >> Regards, > >> > >> >> HY > >> > >> >> > > >> > >> >> >> > >> > >> >> >> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> Regards, > >> > >> >> >>> HY > >> > >> >> >>> > >> > >> >> >>> > > >> > >> >> >>> > On Sat, Oct 10, 2020, 8:23 AM Hongyi Zhao < > hongyi.zhao@gmail.com> wrote: > >> > >> >> >>> >> > >> > >> >> >>> >> Hi, > >> > >> >> >>> >> > >> > >> >> >>> >> My environment is Ubuntu 20.04 and python 3.8.3 managed > by pyenv. I > >> > >> >> >>> >> try to run the script > >> > >> >> >>> >> < > https://notebook.rcc.uchicago.edu/files/acs.chemmater.9b05047/Data/bulk/dft/mu.py > >, > >> > >> >> >>> >> but it will keep running and never end. When I use 'Ctrl > + c' to > >> > >> >> >>> >> terminate it, it will give the following output: > >> > >> >> >>> >> > >> > >> >> >>> >> $ python mu.py > >> > >> >> >>> >> [-10.999 -10.999 -10.999 ... 20. 20. 20. ] > [4.973e-84 > >> > >> >> >>> >> 4.973e-84 4.973e-84 ... 4.973e-84 4.973e-84 4.973e-84] > >> > >> >> >>> >> > >> > >> >> >>> >> I have to terminate it and obtained the following > information: > >> > >> >> >>> >> > >> > >> >> >>> >> ^CTraceback (most recent call last): > >> > >> >> >>> >> File "mu.py", line 38, in <module> > >> > >> >> >>> >> integrand=DOS*fermi_array(energy,mu,kT) > >> > >> >> >>> >> File > "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", > >> > >> >> >>> >> line 2108, in __call__ > >> > >> >> >>> >> return self._vectorize_call(func=func, args=vargs) > >> > >> >> >>> >> File > "/home/werner/.pyenv/versions/datasci/lib/python3.8/site-packages/numpy/lib/function_base.py", > >> > >> >> >>> >> line 2192, in _vectorize_call > >> > >> >> >>> >> outputs = ufunc(*inputs) > >> > >> >> >>> >> File "mu.py", line 8, in fermi > >> > >> >> >>> >> return 1./(exp((E-mu)/kT)+1) > >> > >> >> >>> >> KeyboardInterrupt > >> > >> >> >>> >> > >> > >> >> >>> >> > >> > >> >> >>> >> Any helps and hints for this problem will be highly > appreciated? > >> > >> >> >>> >> > >> > >> >> >>> >> Regards, > >> > >> >> >>> >> -- > >> > >> >> >>> >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> >>> >> _______________________________________________ > >> > >> >> >>> >> NumPy-Discussion mailing list > >> > >> >> >>> >> NumPy-Discussion@python.org > >> > >> >> >>> >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> >>> > > >> > >> >> >>> > _______________________________________________ > >> > >> >> >>> > NumPy-Discussion mailing list > >> > >> >> >>> > NumPy-Discussion@python.org > >> > >> >> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> > >> > >> >> >>> -- > >> > >> >> >>> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> >>> _______________________________________________ > >> > >> >> >>> NumPy-Discussion mailing list > >> > >> >> >>> NumPy-Discussion@python.org > >> > >> >> >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> > > >> > >> >> > _______________________________________________ > >> > >> >> > NumPy-Discussion mailing list > >> > >> >> > NumPy-Discussion@python.org > >> > >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> -- > >> > >> >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> >> _______________________________________________ > >> > >> >> NumPy-Discussion mailing list > >> > >> >> NumPy-Discussion@python.org > >> > >> >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > > >> > >> > _______________________________________________ > >> > >> > NumPy-Discussion mailing list > >> > >> > NumPy-Discussion@python.org > >> > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> > >> _______________________________________________ > >> > >> NumPy-Discussion mailing list > >> > >> NumPy-Discussion@python.org > >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion@python.org > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> -- > >> Hongyi Zhao <hongyi.zhao@gmail.com> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Hongyi Zhao <hongyi.zhao@gmail.com> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >

On Mon, Oct 12, 2020 at 10:50 AM Hongyi Zhao <hongyi.zhao@gmail.com> wrote:
If it's a small job, don't bother optimizing it further at all. We don't know how important this is for you. We can only make conditional recommendations ("if it's worth spending development effort to gain speed, here are some ways to do that"). Balancing the development effort with the utility of further performance gains is entirely up to you. -- Robert Kern

On Sun, Oct 11, 2020 at 2:56 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
The script seems to be computing the particle numbers for an array of chemical potentials.
Two ways of speeding it up, both are likely simpler then using dask:
What do you mean by saying *dask*?
Will this be more efficient than the schema like parallelization based on python modules, say, joblib?
-- Hongyi Zhao <hongyi.zhao@gmail.com>
participants (4)
-
Andrea Gavana
-
Evgeni Burovski
-
Hongyi Zhao
-
Robert Kern