How to embed PyPy when there's no filesystem?

I'm evaluating PyPy for use in an application where it will be running in an RTOS (Greenhills Integrity) which is congifured without a file system at runtime. The rest of the application is C/C++. Is there a way to build PyPy for this environment? The issue I see is that pypy_setup_home()requires a file system path to an executable / .so library. Is it possible to statically link PyPy into the application and the give an equivalent to pypy_setup_home() a pointer to the linked code? Some other approach? Thanks Tom

pypy generally needs to find a bunch of files for it's standard library. I would suggest trying something a-la the sandboxed version where all the external calls go via a special proxy that you can write in C++. it's a bit of effort though. What are you trying to achieve if you have no filesystem? (e.g. the whole module system can't potentially work) On Fri, Apr 24, 2015 at 8:30 PM, <tom@twhanson.com> wrote:

Maciej, Thanks for the idea. I played with the sandboxed version and it looks like it has potential. I searched the web for a C/C++ version of the controller but with no luck. I saw questions about it and interest expressed but couldn't find anyone who had actually built one. Do you (or does anyone) know of an example? Ideal would probably be one implementing SimpleIOSandboxedProc since that would allow streaming of Python source to stdin. I can start from the Python controller if necessary but I'm a C/C++ programmer by trade with very little Python experience. A C example would make it much faster to spin up. Thanks, -Tom On Fri, 24 Apr 2015 21:30:18 +0200, Maciej Fijalkowski wrote: pypy generally needs to find a bunch of files for it's standard library. I would suggest trying something a-la the sandboxed version where all the external calls go via a special proxy that you can write in C++. it's a bit of effort though. What are you trying to achieve if you have no filesystem? (e.g. the whole module system can't potentially work) On Fri, Apr 24, 2015 at 8:30 PM, wrote:
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Tom, On 25 April 2015 at 01:32, Maciej Fijalkowski <fijall@gmail.com> wrote:
It's not necessarily the only option. A sandboxed process comes with a lot of other constrains apart from "no filesystem access". There are alternatives: you could play in ways similar to how you would solve this with CPython, namely trying to embed the parts of the standard library and main program that you need. Just like sandboxing, we don't have much experience and tools to do that ourselves, so you still need to come up with all the details (and we can help, of course). Maybe something like: we can tweak pypy_setup_home() to accept NULL as a path. Then it would not try to automatically set up "sys.path" or import "site". You're left with what is a broken PyPy, in the sense that you cannot import anything, but then you can do calls like pypy_execute_source() to run 5-line scripts --- or even, as a hack, to declare and install complete modules whose source code you have previously copied into static strings in your binary. A bientôt, Armin.

Armin, A good thought. Sandboxing may actually be an advantage from a security standpoint. We'll be developing all of the scripts to be run, but there's always the chance of hacking. We can't hard-code the scripts into the binary becuase their purpose is to adapt behavior to new configurations. Because of this the scripts will be read from an external source and then executed. This is what makes the the stdin/stdout streaming version attractive. Thanks, Tom On Sat, 25 Apr 2015 09:33:58 +0200, Armin Rigo wrote: Hi Tom, On 25 April 2015 at 01:32, Maciej Fijalkowski wrote:
It's not necessarily the only option. A sandboxed process comes with a lot of other constrains apart from "no filesystem access". There are alternatives: you could play in ways similar to how you would solve this with CPython, namely trying to embed the parts of the standard library and main program that you need. Just like sandboxing, we don't have much experience and tools to do that ourselves, so you still need to come up with all the details (and we can help, of course). Maybe something like: we can tweak pypy_setup_home() to accept NULL as a path. Then it would not try to automatically set up "sys.path" or import "site". You're left with what is a broken PyPy, in the sense that you cannot import anything, but then you can do calls like pypy_execute_source() to run 5-line scripts --- or even, as a hack, to declare and install complete modules whose source code you have previously copied into static strings in your binary. A bientôt, Armin.

Hi Tom, On 27 April 2015 at 18:10, <tom@twhanson.com> wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

I'm confused about the relationship between SimpleIOSandboxedProc and VirtualizedSandboxedProc. Looking at pypy_interact.py I see that it is multiply dependent from SimpleIOSandboxedProc and VirtualizedSandboxedProc. I expected that I'd be able to drop VirtualizedSandboxedProc and tweak the code in pypy_interact to get a controller that just did stdin/stdout. But when I try that I get "out of memory" errors. It appears that SimpleIOSandboxedProc is not an independent, stand-alone class but is actually non-functional without the child class VirtualizedSandboxedProc. Is that the intent? Am I missing something? -Tom On Mon, 27 Apr 2015 18:43:19 +0200, Armin Rigo wrote: Hi Tom, On 27 April 2015 at 18:10, wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

Correction: " non-functional without the *peer* class VirtualizedSandboxedProc" On Tue, 28 Apr 2015 09:33:01 -0600, tom@twhanson.com wrote: I'm confused about the relationship between SimpleIOSandboxedProc and VirtualizedSandboxedProc. Looking at pypy_interact.py I see that it is multiply dependent from SimpleIOSandboxedProc and VirtualizedSandboxedProc. I expected that I'd be able to drop VirtualizedSandboxedProc and tweak the code in pypy_interact to get a controller that just did stdin/stdout. But when I try that I get "out of memory" errors. It appears that SimpleIOSandboxedProc is not an independent, stand-alone class but is actually non-functional without the child class VirtualizedSandboxedProc. Is that the intent? Am I missing something? -Tom On Mon, 27 Apr 2015 18:43:19 +0200, Armin Rigo wrote: Hi Tom, On 27 April 2015 at 18:10, wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

Hi Tom, On 28 April 2015 at 19:56, <tom@twhanson.com> wrote:
Correction: " non-functional without the *peer* class VirtualizedSandboxedProc"
Modern PyPy versions try to get some environ variables, at least as documented in rpython/doc/logging.rst. It makes the do_ll_os__ll_os_getenv() method necessary (undefined methods cause the subprocess to be aborted). Moreover, I'm sure that a PyPy in the default configuration will try afterward to access the file system for all its stdlib, which means it will call at least some of the other methods too, starting from do_ll_os__ll_os_stat(). All these methods happen to be in the VirtualizedSandboxedProc class. A bientôt, Armin.

When I kick off the interctive sand-boxed version of PyPy I'm seeing it open a significant number of .py files. 1) Am I correct in assuming that these are imports? 2) Can these be eliminated? These opens are problematic in the absence of a file system. Thanks, Tom On Tue, 28 Apr 2015 22:50:57 +0200, Armin Rigo wrote: Hi Tom, On 28 April 2015 at 19:56, wrote:
Correction: " non-functional without the *peer* class VirtualizedSandboxedProc"
Modern PyPy versions try to get some environ variables, at least as documented in rpython/doc/logging.rst. It makes the do_ll_os__ll_os_getenv() method necessary (undefined methods cause the subprocess to be aborted). Moreover, I'm sure that a PyPy in the default configuration will try afterward to access the file system for all its stdlib, which means it will call at least some of the other methods too, starting from do_ll_os__ll_os_stat(). All these methods happen to be in the VirtualizedSandboxedProc class. A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Tom, On 30 April 2015 at 00:21, <tom@twhanson.com> wrote:
1) Am I correct in assuming that these are imports?
Yes.
2) Can these be eliminated? These opens are problematic in the absence of a file system.
Try running pypy with the -s option. Likely, it doesn't remove them all; you have to provide the remaining modules manually, e.g. by embedding either the .py or the .pyc in the controller process. A bientôt, Armin.

pypy generally needs to find a bunch of files for it's standard library. I would suggest trying something a-la the sandboxed version where all the external calls go via a special proxy that you can write in C++. it's a bit of effort though. What are you trying to achieve if you have no filesystem? (e.g. the whole module system can't potentially work) On Fri, Apr 24, 2015 at 8:30 PM, <tom@twhanson.com> wrote:

Maciej, Thanks for the idea. I played with the sandboxed version and it looks like it has potential. I searched the web for a C/C++ version of the controller but with no luck. I saw questions about it and interest expressed but couldn't find anyone who had actually built one. Do you (or does anyone) know of an example? Ideal would probably be one implementing SimpleIOSandboxedProc since that would allow streaming of Python source to stdin. I can start from the Python controller if necessary but I'm a C/C++ programmer by trade with very little Python experience. A C example would make it much faster to spin up. Thanks, -Tom On Fri, 24 Apr 2015 21:30:18 +0200, Maciej Fijalkowski wrote: pypy generally needs to find a bunch of files for it's standard library. I would suggest trying something a-la the sandboxed version where all the external calls go via a special proxy that you can write in C++. it's a bit of effort though. What are you trying to achieve if you have no filesystem? (e.g. the whole module system can't potentially work) On Fri, Apr 24, 2015 at 8:30 PM, wrote:
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Tom, On 25 April 2015 at 01:32, Maciej Fijalkowski <fijall@gmail.com> wrote:
It's not necessarily the only option. A sandboxed process comes with a lot of other constrains apart from "no filesystem access". There are alternatives: you could play in ways similar to how you would solve this with CPython, namely trying to embed the parts of the standard library and main program that you need. Just like sandboxing, we don't have much experience and tools to do that ourselves, so you still need to come up with all the details (and we can help, of course). Maybe something like: we can tweak pypy_setup_home() to accept NULL as a path. Then it would not try to automatically set up "sys.path" or import "site". You're left with what is a broken PyPy, in the sense that you cannot import anything, but then you can do calls like pypy_execute_source() to run 5-line scripts --- or even, as a hack, to declare and install complete modules whose source code you have previously copied into static strings in your binary. A bientôt, Armin.

Armin, A good thought. Sandboxing may actually be an advantage from a security standpoint. We'll be developing all of the scripts to be run, but there's always the chance of hacking. We can't hard-code the scripts into the binary becuase their purpose is to adapt behavior to new configurations. Because of this the scripts will be read from an external source and then executed. This is what makes the the stdin/stdout streaming version attractive. Thanks, Tom On Sat, 25 Apr 2015 09:33:58 +0200, Armin Rigo wrote: Hi Tom, On 25 April 2015 at 01:32, Maciej Fijalkowski wrote:
It's not necessarily the only option. A sandboxed process comes with a lot of other constrains apart from "no filesystem access". There are alternatives: you could play in ways similar to how you would solve this with CPython, namely trying to embed the parts of the standard library and main program that you need. Just like sandboxing, we don't have much experience and tools to do that ourselves, so you still need to come up with all the details (and we can help, of course). Maybe something like: we can tweak pypy_setup_home() to accept NULL as a path. Then it would not try to automatically set up "sys.path" or import "site". You're left with what is a broken PyPy, in the sense that you cannot import anything, but then you can do calls like pypy_execute_source() to run 5-line scripts --- or even, as a hack, to declare and install complete modules whose source code you have previously copied into static strings in your binary. A bientôt, Armin.

Hi Tom, On 27 April 2015 at 18:10, <tom@twhanson.com> wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

I'm confused about the relationship between SimpleIOSandboxedProc and VirtualizedSandboxedProc. Looking at pypy_interact.py I see that it is multiply dependent from SimpleIOSandboxedProc and VirtualizedSandboxedProc. I expected that I'd be able to drop VirtualizedSandboxedProc and tweak the code in pypy_interact to get a controller that just did stdin/stdout. But when I try that I get "out of memory" errors. It appears that SimpleIOSandboxedProc is not an independent, stand-alone class but is actually non-functional without the child class VirtualizedSandboxedProc. Is that the intent? Am I missing something? -Tom On Mon, 27 Apr 2015 18:43:19 +0200, Armin Rigo wrote: Hi Tom, On 27 April 2015 at 18:10, wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

Correction: " non-functional without the *peer* class VirtualizedSandboxedProc" On Tue, 28 Apr 2015 09:33:01 -0600, tom@twhanson.com wrote: I'm confused about the relationship between SimpleIOSandboxedProc and VirtualizedSandboxedProc. Looking at pypy_interact.py I see that it is multiply dependent from SimpleIOSandboxedProc and VirtualizedSandboxedProc. I expected that I'd be able to drop VirtualizedSandboxedProc and tweak the code in pypy_interact to get a controller that just did stdin/stdout. But when I try that I get "out of memory" errors. It appears that SimpleIOSandboxedProc is not an independent, stand-alone class but is actually non-functional without the child class VirtualizedSandboxedProc. Is that the intent? Am I missing something? -Tom On Mon, 27 Apr 2015 18:43:19 +0200, Armin Rigo wrote: Hi Tom, On 27 April 2015 at 18:10, wrote:
I just said "statically into the binary" as an example. Of course you can get the string from anywhere, like from reading an external source. Once you got it into a "char *", you can pass it to pypy_execute_source(). A bientôt, Armin.

Hi Tom, On 28 April 2015 at 19:56, <tom@twhanson.com> wrote:
Correction: " non-functional without the *peer* class VirtualizedSandboxedProc"
Modern PyPy versions try to get some environ variables, at least as documented in rpython/doc/logging.rst. It makes the do_ll_os__ll_os_getenv() method necessary (undefined methods cause the subprocess to be aborted). Moreover, I'm sure that a PyPy in the default configuration will try afterward to access the file system for all its stdlib, which means it will call at least some of the other methods too, starting from do_ll_os__ll_os_stat(). All these methods happen to be in the VirtualizedSandboxedProc class. A bientôt, Armin.

When I kick off the interctive sand-boxed version of PyPy I'm seeing it open a significant number of .py files. 1) Am I correct in assuming that these are imports? 2) Can these be eliminated? These opens are problematic in the absence of a file system. Thanks, Tom On Tue, 28 Apr 2015 22:50:57 +0200, Armin Rigo wrote: Hi Tom, On 28 April 2015 at 19:56, wrote:
Correction: " non-functional without the *peer* class VirtualizedSandboxedProc"
Modern PyPy versions try to get some environ variables, at least as documented in rpython/doc/logging.rst. It makes the do_ll_os__ll_os_getenv() method necessary (undefined methods cause the subprocess to be aborted). Moreover, I'm sure that a PyPy in the default configuration will try afterward to access the file system for all its stdlib, which means it will call at least some of the other methods too, starting from do_ll_os__ll_os_stat(). All these methods happen to be in the VirtualizedSandboxedProc class. A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Tom, On 30 April 2015 at 00:21, <tom@twhanson.com> wrote:
1) Am I correct in assuming that these are imports?
Yes.
2) Can these be eliminated? These opens are problematic in the absence of a file system.
Try running pypy with the -s option. Likely, it doesn't remove them all; you have to provide the remaining modules manually, e.g. by embedding either the .py or the .pyc in the controller process. A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Maciej Fijalkowski
-
tom@twhanson.com