Re: [pypy-dev] PyPy JVM Backend
Hi Niko, hi Paul! Here are some remarks/suggestions about your discussion. I'm also CC-ing pypy-dev, so we can also catch someone else's suggestions, if any :-).
First, there are those things are absolutely necessary to do to get PyPy to translate:
1. weak refs 2. I don't know what else goes here --- try and see I guess
Apart from weak refs, I think the other big feature missing is support for external functions (mostly I/O functions). Unfortunately this part is a bit of a mess: since in the old days pypy's only targets were low-level platforms the I/O model of RPython is modeled after the unix file descriptors. This means that at the moment high level backends need to emulate the fd interface and forward the real work to the native I/O functions: to see how gencli does, look at the ll_os class in translator/cli/src/ll_os.cs. I guess this would be the easiest way also for genjvm, and probably for now we should pick this solution. I know, this it's both ugly and hackish; the good news is that I and Carl did some work towards a better solution: at the moment all the I/O inside the standard interpreter is done using the (interp-level) rlib/streamio.py library, which in turns uses the low-level file descriptor interface; the long-term solution would be to provide an alternative implementation of streamio.py that uses .NET/Java streams and let the backends using that instead of the current one. Finally, let me add another task that I think it's very important: 3. make most of existing tests passing with genjvm Look at cli/test directory: you will find a lot of files that import tests from somewhere else (mostly from rpython/test). genjvm do the same, but a lot of tests are still missing; once you get all those tests passing you can be quite confident (but still not sure, of course) that pypy would translate. Some of the tests in cli/tests are cli-specific or very old and not conforming to the new test framework; here is a list of the tests you should port these tests to jvm: - test_range.py - test_constant.py - test_tuple.py - test_float.py - test_builtin.py - test_int.py - test_exception.py Moreover, there is test_runtest, which test a bit of the gencli test framework itself; if the genjvm test framework is similar to the gencli's one, it could make sense to port this also this one. Niko, what do you think?
New features:
1. allow Java code to be invoked (need to look at the CLI work here) 2. JSR 223
I have to admit that I don't know very much about JSR-223: Paul, could you send me the specification, please? I don't feel like to register to sun's website :-). If I understand correctly the level of integration provided by JSR-223 is simpler/less powerful than the one offered by Jython, right? If this is the case, it could make sense to implement JSR-223 first (also because it's part of Paul's SoC proposal), then try to provide the same level of integration as Jython (or IronPython for the .NET side); for the latter task, I think that some code could be shared between gencli and genjvm.
Internal Changes That "Would Be Nice":
1. Estimate the maximum size of the stack, rather than giving some arbitrary "high" number as we do know.
this would also be useful for gencli :-)
2. Deal with JVM size limits for very large methods and classes. (never hit those yet but probably will)
probably this falls into the "try to translate pypy-jvm and see" task :-).
3. Find a better way to handle exceptions. One suggestion was not to map rpython's "Object" to the class Object but rather to an interface.
Indeed. Also useful for gencli.
4. Use protected access modifiers rather than public. Pedronis seems to think this would be important.
I don't see how it could be so important at this point. Am I missing something?
5. Speed and performance optimizations, similar to those being done for C#
let me add another point: 6. once we get pypy-jvm, run Python's regression tests and try to pass as much of them as possible. I personally think that point 6 is more important than point 5 at the beginning.
Of these ideas, I am not sure which are best for a "first project". I would be happy to handle weakrefs, since it seems like they may be quite easy for me but time consuming for you, but if you wanted to look into them, that's fine with me too.
Adding support for weakrefs to gencli was a very easy task: just wrap/unwrap the objects into the System.WeakReference when requested.
Antonio, you said you tried to translate PyPy as a whole with the JVM backend? It seems that is the best way to find the list of things that need doing, but I haven't done it myself yet.
Yes, I tried: after a bit of hacking on the interpreter code I could get rid of weakrefs, but the translations failed on a KeyError inside jvm/database.py, then I gave up. To sum up, I think that priority wise the most important thing to do is to port the remaining tests to genjvm and make them passing: weakrefs and external functions are subtasks of this; then try-fail-fix-try until we get a running pypy-jvm. ciao Anto
Also a note from my side: Would be cool to have config option to turn off weakrefs (and some functionallity provided by them, like subclasses). I trimmed (same as Antonio) weakref from the interpreter and shouldn't be hard and maybe will allow to translate whole pypy on js backend (well, not really sure, but let's see) Cheers, fijal
Antonio Cuni wrote:
Hi Niko, hi Paul! Here are some remarks/suggestions about your discussion. I'm also CC-ing pypy-dev, so we can also catch someone else's suggestions, if any :-).
I have one :-).
First, there are those things are absolutely necessary to do to get PyPy to translate:
1. weak refs 2. I don't know what else goes here --- try and see I guess
Apart from weak refs, I think the other big feature missing is support for external functions (mostly I/O functions). Unfortunately this part is a bit of a mess: since in the old days pypy's only targets were low-level platforms the I/O model of RPython is modeled after the unix file descriptors. This means that at the moment high level backends need to emulate the fd interface and forward the real work to the native I/O functions: to see how gencli does, look at the ll_os class in translator/cli/src/ll_os.cs. I guess this would be the easiest way also for genjvm, and probably for now we should pick this solution.
I know, this it's both ugly and hackish; the good news is that I and Carl did some work towards a better solution: at the moment all the I/O inside the standard interpreter is done using the (interp-level) rlib/streamio.py library, which in turns uses the low-level file descriptor interface; the long-term solution would be to provide an alternative implementation of streamio.py that uses .NET/Java streams and let the backends using that instead of the current one.
I think it is the time now to do away with the file descriptor simulation, it was useful at one point but is very silly now. Instead, a subclass of pypy.rlib.streamio.Stream should be created that only delegates to the Java/.NET Stream classes, probably making use of the facilities for buffering that the platforms offer. I think it is perfectly reasonable to not have os.open and friends on pypy.net as long as file works. If another placeof pypy still uses os.open I am strongly for fixing that. Cheers, Carl Friedrich
Hi Carl, On Sat, Apr 28, 2007 at 07:56:11PM +0200, Carl Friedrich Bolz wrote:
If another placeof pypy still uses os.open I am strongly for fixing that.
There is the app-level os.open(), which of course uses the interp-level os.open(). It means that if a backend only supports streamio, we can't easily provide the low-level os functions at app-level. This is probably fine, though: I think that Jython doesn't have them at all, for example. (Also, I guess that someone in a reverse-hacking mood could rewrite the interp-level code implementing os.open() to use streamio if necessary...) A bientot, Armin.
Hi Armin! Armin Rigo wrote:
On Sat, Apr 28, 2007 at 07:56:11PM +0200, Carl Friedrich Bolz wrote:
If another placeof pypy still uses os.open I am strongly for fixing that.
There is the app-level os.open(), which of course uses the interp-level os.open(). It means that if a backend only supports streamio, we can't easily provide the low-level os functions at app-level. This is probably fine, though: I think that Jython doesn't have them at all, for example. (Also, I guess that someone in a reverse-hacking mood could rewrite the interp-level code implementing os.open() to use streamio if necessary...)
That's more or less exactly what I meant by the following: "I think it is perfectly reasonable to not have os.open and friends on pypy.net as long as file works. If another placeof pypy still uses os.open I am strongly for fixing that." But it was probably confusing, because it mixed the applevel os.open (first occurrence) with the interplevel os.oppen (second occurrence). Thanks for clarifying. Cheers, Carl Friedrich
Armin Rigo wrote:
Hi Carl,
On Sat, Apr 28, 2007 at 07:56:11PM +0200, Carl Friedrich Bolz wrote:
If another placeof pypy still uses os.open I am strongly for fixing that.
There is the app-level os.open(), which of course uses the interp-level os.open(). It means that if a backend only supports streamio, we can't easily provide the low-level os functions at app-level. This is probably fine, though: I think that Jython doesn't have them at all, for example. (Also, I guess that someone in a reverse-hacking mood could rewrite the interp-level code implementing os.open() to use streamio if necessary... That includes os.dup and few others (I can imagine implementing os.dup using streamio, but that would be insane reverse-hacking mood).
Probably one good step would be to make our tools (mostly py.test) work without applevel os.dup and friends (it uses it in few places, also for capturing, but that's quite shallow and capturing can be even tuned with options).
Maciek Fijalkowski wrote:
Probably one good step would be to make our tools (mostly py.test) work without applevel os.dup and friends (it uses it in few places, also for capturing, but that's quite shallow and capturing can be even tuned with options).
+1 (and maybe add a new --noposix option that turns off all those features when running on a platform != posix)
Carl Friedrich Bolz wrote:
I think it is the time now to do away with the file descriptor simulation, it was useful at one point but is very silly now. Instead, a subclass of pypy.rlib.streamio.Stream should be created that only delegates to the Java/.NET Stream classes, probably making use of the facilities for buffering that the platforms offer. I think it is perfectly reasonable to not have os.open and friends on pypy.net as long as file works. If another placeof pypy still uses os.open I am strongly for fixing that.
I agree, and this is why I mentioned the problem :-). I think there are two ways to make it working: 1) write a dummy CliFile (or JvmFile) subclass of stream, then special-case that class in the backend to map directly to System.Io.FileStream (or the Java equivalent) 2) make CliFile or JvmFile real classes, using the interpret-level bindings to forward the methods; then, we should modify open_file_as_stream and construct_stream_tower to instantiate these classes instead of the standard ones. In both cases I also think it's not trivial to get all the combination of mode/options working, because .NET uses a slightly different set of options than posix to determine how to open a file (I don't know about Java). I think that solution (2) is easier to implement and more readable, but so far it's possible only for gencli because genjvm doens't provide interp-level bindings to java libraries. By contrast Solution (1) is not trivial to implement if the interfaces of our Stream class and the Java's one are very different. Maybe a better solution would be to map the dummy streamio.JvmFile to a class written in Java doing the necessary conversions and forwarding to the native stream class. About the app-level os.* functions; I also think that for now we could simply omit them, but in the long term we should write an alternative implementation based on streamio (IronPython does it in a similar way). ciao Anto
Hi Anto! Antonio Cuni wrote:
I agree, and this is why I mentioned the problem :-).
I think there are two ways to make it working:
1) write a dummy CliFile (or JvmFile) subclass of stream, then special-case that class in the backend to map directly to System.Io.FileStream (or the Java equivalent)
2) make CliFile or JvmFile real classes, using the interpret-level bindings to forward the methods;
This is indeed the solution that I had in mind. Of course I forgot that there are no interpreter-level bindings for Java yet, how hard would it be to get them? We will need them anyway later. And writing them might still be less work than writing the file descriptor implementation.
then, we should modify open_file_as_stream and construct_stream_tower to instantiate these classes instead of the standard ones.
The result will be very cool because you can then use rlib.streamio and get good cross-platform behavior.
In both cases I also think it's not trivial to get all the combination of mode/options working, because .NET uses a slightly different set of options than posix to determine how to open a file (I don't know about Java).
I don't think it is so bad. If you don't want to map the Python semantics to the .NET ones, you can use the .NET stream only as the lowestone in the stream tower and leave buffering and line endings to the existing code. Or you can find an intermediate solution.
About the app-level os.* functions; I also think that for now we could simply omit them, but in the long term we should write an alternative implementation based on streamio (IronPython does it in a similar way).
Yes, but this can probably even be done on applevel. Cheers, Carl Friedrich
Carl Friedrich Bolz wrote:
Hi Anto!
Antonio Cuni wrote:
I agree, and this is why I mentioned the problem :-).
I think there are two ways to make it working:
1) write a dummy CliFile (or JvmFile) subclass of stream, then special-case that class in the backend to map directly to System.Io.FileStream (or the Java equivalent)
2) make CliFile or JvmFile real classes, using the interpret-level bindings to forward the methods;
This is indeed the solution that I had in mind. Of course I forgot that there are no interpreter-level bindings for Java yet, how hard would it be to get them? We will need them anyway later. And writing them might still be less work than writing the file descriptor implementation.
Part of it is a SoC anyway, so I don't think we care in what order SoC is done. (and personally I think it makes sense to provide Java bindings first and than to care about the translation).
Maciek Fijalkowski wrote:
Part of it is a SoC anyway, so I don't think we care in what order SoC is done. (and personally I think it makes sense to provide Java bindings first and than to care about the translation).
Well, strictly speaking java bindings for rpython are not part of Paul's SoC proposal but indeed they are probably the most effective way to implement other features that he promised. Btw, I'm not sure it's a good idea to develop them as the first task in PyPy, because it's not straightforward if you have no experience with the rtyper. My suggestion is to start by porting tests to genjvm and fixing the discovered bugs, because it should be a more newcomer-friendly task; then java-bindings for rpython and I/O layer; finally finally translation of pypy-jvm. Paul, what do you think of this plan? ciao Anto
Anto, I think this is a good strategy for me. It'll allow me to put something material to the information I gained from your thesis and gain knowledge that will help me in the long run with PyPy. The tasks for me will be: 1.) Port tests, fix uncovered bugs 2.) Java bindings 3.) I/O layer as discussed on this list 4.) pypy-jvm translation 5.) JSR 223 (The Scripting API) For more information on JSR 223, here's a little tutorial/article: http://www.onjava.com/pub/a/onjava/2006/04/26/mustang-meets-rhino-java-se-6-... And here are the languages with bindings for it now: https://scripting.dev.java.net/ Regards, Paul On 4/30/07, Antonio Cuni <anto.cuni@gmail.com> wrote:
Maciek Fijalkowski wrote:
Part of it is a SoC anyway, so I don't think we care in what order SoC is done. (and personally I think it makes sense to provide Java bindings first and than to care about the translation).
Well, strictly speaking java bindings for rpython are not part of Paul's SoC proposal but indeed they are probably the most effective way to implement other features that he promised.
Btw, I'm not sure it's a good idea to develop them as the first task in PyPy, because it's not straightforward if you have no experience with the rtyper.
My suggestion is to start by porting tests to genjvm and fixing the discovered bugs, because it should be a more newcomer-friendly task; then java-bindings for rpython and I/O layer; finally finally translation of pypy-jvm.
Paul, what do you think of this plan?
ciao Anto _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Carl Friedrich Bolz wrote:
2) make CliFile or JvmFile real classes, using the interpret-level bindings to forward the methods;
This is indeed the solution that I had in mind. Of course I forgot that there are no interpreter-level bindings for Java yet, how hard would it be to get them? We will need them anyway later. And writing them might still be less work than writing the file descriptor implementation.
The implementation of file descriptor by itself is quite easy, but I agree that in this case the correct solution is to develop java bindings for RPython, which will probably also be useful for next tasks such as JSR-223. Also, java bindings could share (as usual :-)) most of the rtyper code used by the .NET bindings.
then, we should modify open_file_as_stream and construct_stream_tower to instantiate these classes instead of the standard ones.
The result will be very cool because you can then use rlib.streamio and get good cross-platform behavior.
this would be a big bonus. Going further, how hard would it be to rtype the builtin "file" to use streamio?
I don't think it is so bad. If you don't want to map the Python semantics to the .NET ones, you can use the .NET stream only as the lowestone in the stream tower and leave buffering and line endings to the existing code. Or you can find an intermediate solution.
Yes, it makes sense. .NET streams are always buffered when reading, I don't know if/how another layer of buffering could affect performances. Moreover, I think there is no easy way to open a stream in "text" mode: on Windows if you open a text file with the low level stream interface you get "\r\n" at the end of lines, so it might be necessary to insert another layer that does the conversion (the same as I'm doing now in the C# implementation -- see the CRLFTextFile class in cli/srs/ll_os.cs).
About the app-level os.* functions; I also think that for now we could simply omit them, but in the long term we should write an alternative implementation based on streamio (IronPython does it in a similar way).
Yes, but this can probably even be done on applevel.
sure, and it would be probably reusable both for cli and jvm! ciao Anto
participants (5)
-
Antonio Cuni
-
Armin Rigo
-
Carl Friedrich Bolz
-
Maciek Fijalkowski
-
Paul deGrandis