Hey, I just translated SVN trunk of PyPy with JIT enabled and I was very pleased by the results as I noticed that PyPy with JIT is now two times faster in Pystone (I know that's not saying too much) than Cpython. However, translating took a very long time due to not being able to make use of my other CPU cores. I did not find a switch to compile using multiple shells (as in -jN) so I was wondering whether such a feature was planned or even possible. At least for the compilation part, this should be very possible. -- Sven-Hendrik
2009/12/4 Sven-Hendrik Haase <sh@lutzhaase.com>:
Hey,
I just translated SVN trunk of PyPy with JIT enabled and I was very pleased by the results as I noticed that PyPy with JIT is now two times faster in Pystone (I know that's not saying too much) than Cpython. However, translating took a very long time due to not being able to make use of my other CPU cores. I did not find a switch to compile using multiple shells (as in -jN) so I was wondering whether such a feature was planned or even possible. At least for the compilation part, this should be very possible.
No, that's not planned at the moment. The only time we could easily use multiple CPU at the moment is compiling the final C sources. You can even do this manually with the make file in the temp source directory. -- Regards, Benjamin
Benjamin Peterson wrote:
No, that's not planned at the moment. The only time we could easily use multiple CPU at the moment is compiling the final C sources. You can even do this manually with the make file in the temp source directory.
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way. Anyone feeling like to implement it? :-)
Hi, On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system... A bientot, Armin.
On Sat, Dec 5, 2009 at 4:44 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system...
A bientot,
Armin.
I guess the original idea was to have a translation option that is passed as -j flag to make, so one can specify what number of jobs he wants, instead of trying to guess it automatically. Cheers, fijal
On 5 Dec 2009, 04:49 pm, fijall@gmail.com wrote:
On Sat, Dec 5, 2009 at 4:44 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. �The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. �It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system...
A bientot,
Armin.
I guess the original idea was to have a translation option that is passed as -j flag to make, so one can specify what number of jobs he wants, instead of trying to guess it automatically.
I poked around on this front a bit. I couldn't find any code in PyPy which invokes make. I did find pypy.translator.platform.distutils_platform.DistutilsPlatform._build, though. This seems to be where lists of C files are sent for compilation. Is that right? I thought about how to make this parallel. The cheesy solution, of course, would be to start a few threads and have them do the compilation (which should be sufficiently parallel, since it's another process that's doing the actual work). This is a bit complicated by the chdir calls in the code, though. Also, maybe distutils isn't threadsafe. I dunno if I'll think about this any further, but I thought I'd summarize what little I did figure out. Jean-Paul
Cheers, fijal _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Hello, 2010/1/9 <exarkun@twistedmatrix.com>:
On 5 Dec 2009, 04:49 pm, fijall@gmail.com wrote:
I guess the original idea was to have a translation option that is passed as -j flag to make, so one can specify what number of jobs he wants, instead of trying to guess it automatically.
I poked around on this front a bit. I couldn't find any code in PyPy which invokes make. I did find pypy.translator.platform.distutils_platform.DistutilsPlatform._build, though. This seems to be where lists of C files are sent for compilation. Is that right?
PyPy does generate a makefile (gen_makefile() in http://codespeak.net/svn/pypy/trunk/pypy/translator/c/genc.py ) but it is not used in all configurations: In the same file, see the call to execute_makefile(). -Ojit uses the makefile, though. -- Amaury Forgeot d'Arc
On Sat, Dec 5, 2009 at 16:44, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system...
My 2 cents: for C++ I would be really worried, but for C I'd guess that even with 4 CPUs (i.e. classically 5 jobs) one is not going to go over 1Gb... or not? Dear Sven-Hendrik, would you verify this idea by launching make by hand, as suggested? (Now, I'm going back to just lurking). Regards -- Paolo Giarrusso
Armin Rigo, 05.12.2009 16:44:
On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system...
I just did a quick check with lxml.etree, for which Cython generates a 6.5MB C file with 150K lines (~96K non-empty/non-'#' lines in gcc -E). Running that through "gcc -O3 -march=core2 -pipe" keeps the peek virtual memory allocation in 'top' well below 350MB on my 32bit Linux system. Developer machines tend to be rather well equipped these days, so not much to worry about here, IMHO. Stefan
On 07.12.2009 10:48, Stefan Behnel wrote:
Armin Rigo, 05.12.2009 16:44:
On Fri, Dec 04, 2009 at 06:18:13PM +0100, Antonio Cuni wrote:
I agree that at this point in time we cannot or don't want to make annotation/rtyping/backend parallelizable, but it should definitely be possible to just pass the -j flag to 'make' in an automatic way.
Of course, that is full of open problems too. The main one is that each gcc process consumes potentially a lot of RAM, so just passing "-j" is not a great idea, as all gccs are started in parallel. It looks like some obscure tweak is needed, like setting -j to a number that depends not only on the number of CPUs (as is classically done) but also on the total RAM of the system...
I just did a quick check with lxml.etree, for which Cython generates a 6.5MB C file with 150K lines (~96K non-empty/non-'#' lines in gcc -E). Running that through "gcc -O3 -march=core2 -pipe" keeps the peek virtual memory allocation in 'top' well below 350MB on my 32bit Linux system. Developer machines tend to be rather well equipped these days, so not much to worry about here, IMHO.
Stefan
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Indeed, my own tests support this. Especially with kernel 2.6.32, memory should no longer be the issue. -- Sven-Hendrik
participants (9)
-
Amaury Forgeot d'Arc
-
Antonio Cuni
-
Armin Rigo
-
Benjamin Peterson
-
exarkun@twistedmatrix.com
-
Maciej Fijalkowski
-
Paolo Giarrusso
-
Stefan Behnel
-
Sven-Hendrik Haase