Pypy translation fails on MIPS (without JIT)
Hi everyone, As someones know, I'm working for a company (as an internship) on the MIPS port of Pypy. I started the JIT backend but not finished and for the moment I try to translate Pypy without JIT (because JIT isn't finished). I try to translate it directly on the target, even if it will take very very long time. But for the moment I only try to compile a small Rpython snippet : def main(args): print "Hello World" return 0 def target(*args): return main, None with this command line : python ../../rpython/bin/rpython -O1 test.py Then, I've got this translation configuration : [translation] translate.py configuration: [translation] [translate] [translation] opt = 1 [translation] targetspec = test [translation] translation configuration: [translation] [translation] [translation] [backendopt] [translation] inline_threshold = 16.2 [translation] continuation = False [translation] gc = boehm [translation] gcremovetypeptr = False [translation] gcrootfinder = n/a [translation] gctransformer = boehm [translation] list_comprehension_operations = True Annotating&Simplifying is OK, as RTyping, backend optimisations, stack checking, database creation and C generation. But during compilation, the first instruction fails... [platform:execute] make -j 2 in /tmp/usession-unknown-12/testing_1 [platform:Error] /opt/cross-native-mipsel-linux-gnu/bin/../lib/gcc/mipsel-linux-gnu/4.4.6/../../../../mipsel-linux-gnu/bin/ld: test-c: local symbol `__data_start' in /opt/cross-native-mipsel-linux-gnu/bin/../mipsel-linux-gnu/sysroot/usr/lib/crt1.o is referenced by DSO [platform:Error] /opt/cross-native-mipsel-linux-gnu/bin/../lib/gcc/mipsel-linux-gnu/4.4.6/../../../../mipsel-linux-gnu/bin/ld: final link failed: Bad value [platform:Error] collect2: ld returned 1 exit status [platform:Error] make: *** [test-c] Error 1 I search on Google and all I found is that it can comes from binutils 2.21, I try to create a new toolchain with binutils 2.22 but same error... Is anyone got any idea where I can search to debug this ? Thanks, Alexis
Hello to all, and thanks for the great work on PyPy. There remains a very problematic case in PyPy, where PyPy (tested with nightly) is around 193 times slower than CPython. This happens when reading a file using the universal newline flag ("rU"). Are there any plans to fix this problematic case, or should we just avoid using it when running under PyPy? Kind regards, lefteris.
On Wed, Jul 3, 2013 at 5:47 PM, Eleytherios Stamatogiannakis <estama@gmail.com> wrote:
Hello to all, and thanks for the great work on PyPy.
There remains a very problematic case in PyPy, where PyPy (tested with nightly) is around 193 times slower than CPython.
This happens when reading a file using the universal newline flag ("rU").
Are there any plans to fix this problematic case, or should we just avoid using it when running under PyPy?
Kind regards,
lefteris. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Hey. Well obviously first someone has to report the issue, which you just did. Can you please file a bug report with a standalone program that shows the behavior and put it on bugs.pypy.org? Thanks for your interest in pypy! Cheers, fijal
Hello, We also found a case where PyPy is 2x slower than CPython. The following code: <<<< import cPickle fileIter=open("pypytesting", "w+b") mylist = ["qwerty"] * 100 for i in xrange(1000000): cPickle.dump(mylist, fileIter,1)
Runs at: CPython 2.7.3: 13.114 sec PyPy nightly: 29.239 sec [Warning: it'll produce a file (pypytesting) that is 205 MB in size] Kind regards, lefteris.
2013/7/3 Eleytherios Stamatogiannakis <estama@gmail.com>
Hello,
We also found a case where PyPy is 2x slower than CPython. The following code:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython. Note: with pypy, io.open() is even slower :-(
<<<<
import cPickle
fileIter=open("pypytesting", "w+b") mylist = ["qwerty"] * 100
for i in xrange(1000000): cPickle.dump(mylist, fileIter,1)
Runs at: CPython 2.7.3: 13.114 sec PyPy nightly: 29.239 sec
[Warning: it'll produce a file (pypytesting) that is 205 MB in size]
Kind regards,
lefteris. ______________________________**_________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev>
-- Amaury Forgeot d'Arc
On 03/07/13 19:16, Amaury Forgeot d'Arc wrote:
2013/7/3 Eleytherios Stamatogiannakis <estama@gmail.com <mailto:estama@gmail.com>>
Hello,
We also found a case where PyPy is 2x slower than CPython. The following code:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython.
Ah, i guess that the I/O isn't so much optimized, as the other things are, on PyPy? Thanks for looking into this test case. l.
Note: with pypy, io.open() is even slower :-(
<<<<
import cPickle
fileIter=open("pypytesting", "w+b") mylist = ["qwerty"] * 100
for i in xrange(1000000): cPickle.dump(mylist, fileIter,1)
>>>>
Runs at: CPython 2.7.3: 13.114 sec PyPy nightly: 29.239 sec
[Warning: it'll produce a file (pypytesting) that is 205 MB in size]
Kind regards,
lefteris. _________________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> http://mail.python.org/__mailman/listinfo/pypy-dev <http://mail.python.org/mailman/listinfo/pypy-dev>
-- Amaury Forgeot d'Arc
Il giorno 03/lug/2013 18:17, "Amaury Forgeot d'Arc" <amauryfa@gmail.com> ha scritto:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython.
Few days ago I discovered that there is an easy optimization for this. If you look at how str2charp & friends are implemented, you see that we do an RPython loop and copy char by char. By contrast, things like string concatenation are implemented using memcpy and are much faster (like 3-4 times, iirc). Sorry if I don't give more precise pointer, but I'm on my mobile phone :-)
Hello Eleytherios, On 07/04/2013 08:12 AM, Antonio Cuni wrote:
Il giorno 03/lug/2013 18:17, "Amaury Forgeot d'Arc" <amauryfa@gmail.com <mailto:amauryfa@gmail.com>> ha scritto:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython.
Few days ago I discovered that there is an easy optimization for this. If you look at how str2charp & friends are implemented, you see that we do an RPython loop and copy char by char. By contrast, things like string concatenation are implemented using memcpy and are much faster (like 3-4 times, iirc). Sorry if I don't give more precise pointer, but I'm on my mobile phone :-)
could you try to rerun your benchmark on the improve-str2charp branch please? The benchmarks on speed.pypy.org shows some important speedup in e.g. twisted_tcp or raytrace_simple, which seems to contain a lot of write I/O, so it might help your case as well: http://speed.pypy.org/comparison/?exe=1%2BL%2Bdefault%2C1%2BL%2Bimprove-str2charp&ben=1%2C34%2C27%2C2%2C25%2C3%2C46%2C4%2C5%2C41%2C42%2C22%2C44%2C6%2C39%2C7%2C8%2C45%2C23%2C24%2C9%2C10%2C47%2C48%2C49%2C50%2C51%2C11%2C12%2C13%2C40%2C14%2C15%2C35%2C36%2C37%2C38%2C16%2C52%2C54%2C55%2C53%2C56%2C28%2C30%2C32%2C29%2C33%2C17%2C18%2C19%2C20%2C43&env=1&hor=true&bas=1%2BL%2Bdefault&chart=normal+bars ciao, Anto
On 09/07/13 01:41, Antonio Cuni wrote:
Hello Eleytherios,
On 07/04/2013 08:12 AM, Antonio Cuni wrote:
Il giorno 03/lug/2013 18:17, "Amaury Forgeot d'Arc" <amauryfa@gmail.com <mailto:amauryfa@gmail.com>> ha scritto:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython.
Few days ago I discovered that there is an easy optimization for this. If you look at how str2charp & friends are implemented, you see that we do an RPython loop and copy char by char. By contrast, things like string concatenation are implemented using memcpy and are much faster (like 3-4 times, iirc). Sorry if I don't give more precise pointer, but I'm on my mobile phone :-)
could you try to rerun your benchmark on the improve-str2charp branch please? The benchmarks on speed.pypy.org shows some important speedup in e.g. twisted_tcp or raytrace_simple, which seems to contain a lot of write I/O, so it might help your case as well:
The times that we got with improve-str2charp, are a little worse than the previous nightly build that i had tried with in my previous email. I have rerun the benchmark and the times (best of 3 runs) are: CPython 2.7.3: 14.173 sec PyPy nightly 3/7/2013: 32.105 sec PyPy improve-str2charp: 34.044 sec Regards, l.
We just found something strange that is going on with the test that we had posted. Due to Amaury's suspicion we started looking into PyPy's I/O speed. It turns out it is more or less the same speed as Python's: <<<< f=open("pypytesting", "w+b") mylist = str(["qwerty"] * 100) for i in xrange(1000000): f.write(mylist)
Running at: CPython 2.7.3: 12.563 sec PyPy nightly: 12.492 total sec The previous code that we had posted (you can see it in previous email) does a: "" for i in xrange(1000000): cPickle.dump(mylist, f,1) "" And runs at: CPython 2.7.3: 13.114 sec PyPy nightly: 29.239 sec If we change previous code to write in another equivalent way: "" for i in xrange(1000000): f.write(cPickle.dumps(mylist,1)) "" Then the times are the same between CPython and PyPy: Cpython 2.7.3: 12.802 sec PyPy nightly: 12.181 sec Why is there such a huge speed difference between cPickle.dump( ... f) and f.write(cPickle.dumps(...)) ? Kind regards. l. On 03/07/13 19:16, Amaury Forgeot d'Arc wrote:
2013/7/3 Eleytherios Stamatogiannakis <estama@gmail.com <mailto:estama@gmail.com>>
Hello,
We also found a case where PyPy is 2x slower than CPython. The following code:
This is because of I/O. If I replace the file with a custom class which has an empty write() method, pypy is twice faster than CPython.
Note: with pypy, io.open() is even slower :-(
<<<<
import cPickle
fileIter=open("pypytesting", "w+b") mylist = ["qwerty"] * 100
for i in xrange(1000000): cPickle.dump(mylist, fileIter,1)
>>>>
Runs at: CPython 2.7.3: 13.114 sec PyPy nightly: 29.239 sec
[Warning: it'll produce a file (pypytesting) that is 205 MB in size]
Kind regards,
lefteris. _________________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> http://mail.python.org/__mailman/listinfo/pypy-dev <http://mail.python.org/mailman/listinfo/pypy-dev>
-- Amaury Forgeot d'Arc
2013/7/18 Eleytherios Stamatogiannakis <estama@gmail.com>
Why is there such a huge speed difference between cPickle.dump( ... f) and f.write(cPickle.dumps(...)) ?
Did you count the number of calls to f.write? pickle call write() once per pickled object. Now, pypy's implementation of buffered file uses a (RPython) list of strings, and does a final ''.join. This is probably much less efficient than the RStringIO implementation. -- Amaury Forgeot d'Arc
Yes you are right, cPickle.dump most probably does a lot more small writes. On the other hand, shouldn't this also affect CPython? Or is CPython so much faster in "".join-ing strings? Is "".join-ing something that we should generally avoid doing in PyPy? l. On 18/07/13 13:53, Amaury Forgeot d'Arc wrote:
2013/7/18 Eleytherios Stamatogiannakis <estama@gmail.com <mailto:estama@gmail.com>>
Why is there such a huge speed difference between cPickle.dump( ... f) and f.write(cPickle.dumps(...)) ?
Did you count the number of calls to f.write? pickle call write() once per pickled object.
Now, pypy's implementation of buffered file uses a (RPython) list of strings, and does a final ''.join. This is probably much less efficient than the RStringIO implementation.
-- Amaury Forgeot d'Arc
2013/7/18 Eleytherios Stamatogiannakis <estama@gmail.com>
Yes you are right, cPickle.dump most probably does a lot more small writes.
On the other hand, shouldn't this also affect CPython? Or is CPython so much faster in "".join-ing strings?
CPython is not affected, because files are implemented with the C fopen, fwrite... and use a completely different buffer.
Is "".join-ing something that we should generally avoid doing in PyPy?
This is in RPython code, not in the PyPy interpreter. But yes, a buffering method that is optimized for append() + flush() is better than an all-purpose object. -- Amaury Forgeot d'Arc
Hi Alexis, On Wed, Jul 3, 2013 at 4:04 PM, Alexis BRENON <abrenon@wyplay.com> wrote:
Is anyone got any idea where I can search to debug this ?
Check first that compiling programs works at all. Then, still with a 5-lines example .c, try to add options to gcc one at a time until you reach a very similar command-line to the one in the generated Makefile. I bet it crashes at some point. A bientôt, Armin.
Hi Alexis,
On Wed, Jul 3, 2013 at 4:04 PM, Alexis BRENON <abrenon@wyplay.com> wrote:
Is anyone got any idea where I can search to debug this ? Check first that compiling programs works at all. Then, still with a 5-lines example .c, try to add options to gcc one at a time until you reach a very similar command-line to the one in the generated Makefile. I bet it crashes at some point.
A bientôt,
Armin. Thanks Armin for you advice and I find where it crashes. When linking,
Le 04/07/2013 08:44, Armin Rigo a écrit : there is this option: --version-script=../dynamic-symbols-1 I read this file and remembered that I already see a similar one when I was searching on Google. First this file looks like this : { global: rpython_startup_code; get_errno; set_errno; local: *; }; Adding the failing symbol in global make it works. { global: rpython_startup_code; get_errno; set_errno; __data_start; local: *; }; Nevertheless, the resulting executable, which must display "Hello World" (as usual), segfault when I launch it... From GDB I get this, when running it with a breakpoint on main() (so it fails before enterring in the main) : Program received signal SIGSEGV, Segmentation fault. 0x77fc86a4 in dl_main (phdr=<value optimized out>, phnum=<value optimized out>, user_entry=<value optimized out>, auxv=0x7fff6e44) at rtld.c:1652 1652 rtld.c: No such file or directory. in rtld.c (gdb) bt #0 0x77fc86a4 in dl_main (phdr=<value optimized out>, phnum=<value optimized out>, user_entry=<value optimized out>, auxv=0x7fff6e44) at rtld.c:1652 #1 0x77fdd560 in _dl_sysdep_start (start_argptr=<value optimized out>, dl_main=0x77fc7a84 <dl_main>) at ../elf/dl-sysdep.c:244 #2 0x77fca5c4 in _dl_start_final (arg=0x7fff6e00, info=<value optimized out>) at rtld.c:336 #3 0x77fca860 in _dl_start (arg=0x7fff6e00) at rtld.c:564 #4 0x77fc6894 in __start () from /lib/ld.so.1 Backtrace stopped: frame did not save the PC This bug is the same as if translate my simple Rpython file with the -O2 option. The translation success, but I get the exactly same error on the resulting executable... I encouter this error when I translate (with -O2 option) the targetpypystandalone.py file too, when it tries to execute platcheck_0 during translation... Maybe there is a deeper reason of all these failures. But I can't point out which or where... Any idea ? Thanks, Alexis
Hi Alexis, On Thu, Jul 4, 2013 at 9:48 AM, Alexis BRENON <abrenon@wyplay.com> wrote:
Maybe there is a deeper reason of all these failures. But I can't point out which or where... Any idea ?
The same as I already said in my previous reply. Try from the other side: write a 5-lines example C program, try to compile, and then progressively make it "more like" what the translation produces, notably in terms of options from the Makefile. A bientôt, Armin.
Le 04/07/2013 18:14, Armin Rigo a écrit :
Hi Alexis,
Maybe there is a deeper reason of all these failures. But I can't point out which or where... Any idea ? The same as I already said in my previous reply. Try from the other side: write a 5-lines example C program, try to compile, and then
On Thu, Jul 4, 2013 at 9:48 AM, Alexis BRENON <abrenon@wyplay.com> wrote: progressively make it "more like" what the translation produces, notably in terms of options from the Makefile.
A bientôt,
Armin. Hi,
Well, I did some tests on a very simple C file that just prints "Hello World !". I did many option combinations and I could notice that : - with -Wl,--version-script=dynamic-symbols-1 LDFLAGS alone (with no other option), compilation works but execution segfault - this LDFLAGS seems to be OK with many other options (same result as previous) - as soon as I add -lgc option to LIBS, compilation fails with the __data_start error You can find tests and result on this pastebin : http://pastebin.com/F8NvC0Ex Thanks for your help. Alexis.
Hi Alexis, On Fri, Jul 5, 2013 at 10:00 AM, Alexis BRENON <abrenon@wyplay.com> wrote:
Well, I did some tests on a very simple C file that just prints "Hello World !". I did many option combinations and I could notice that : - with -Wl,--version-script=dynamic-symbols-1 LDFLAGS alone (with no other option), compilation works but execution segfault
I cannot help here. You need someone that knows about gcc and ld on MIPS. If you can't find any help, try asking on stackoverflow.com.
- as soon as I add -lgc option to LIBS, compilation fails with the __data_start error
Same, but this is specifically about Boehm's libgc on MIPS. A bientôt, Armin.
participants (6)
-
Alexis BRENON
-
Amaury Forgeot d'Arc
-
Antonio Cuni
-
Armin Rigo
-
Eleytherios Stamatogiannakis
-
Maciej Fijalkowski