I have a question for those of you who have embedded Python into a large application concerning how you handle the module search path and related functions. Currently the module search path (and the related directory names set as a side effect) is determined by looking at the environment that the executable calling Py_Initialize() is running in. Hence if I've embedded Python 2.3 and also have Python 2.3 installed in (say /usr/local) it is going to use the Python paths in /usr/local/ over those in my customized embedded version. As far as I can tell, the only way I can control this behavior is to rewrite Py_GetPath and friends in my custom build. In my case the user of my application has a configuration file which specifies the pathnames for platform (in-)dependent files, both Python and other. But I cannot pass this information on to Py_Initialize() and on into Py_GetPath. Is it worth providing an alternative initialization API that allows these values to be specified explicitly instead of having them computed? Or is there a reason not to do this? I appreciate the insight. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
On vrijdag, aug 15, 2003, at 18:37 Europe/Amsterdam, Tom Emerson wrote:
I have a question for those of you who have embedded Python into a large application concerning how you handle the module search path and related functions.
Currently the module search path (and the related directory names set as a side effect) is determined by looking at the environment that the executable calling Py_Initialize() is running in. Hence if I've embedded Python 2.3 and also have Python 2.3 installed in (say /usr/local) it is going to use the Python paths in /usr/local/ over those in my customized embedded version.
As far as I can tell, the only way I can control this behavior is to rewrite Py_GetPath and friends in my custom build.
In my case the user of my application has a configuration file which specifies the pathnames for platform (in-)dependent files, both Python and other. But I cannot pass this information on to Py_Initialize() and on into Py_GetPath.
Is it worth providing an alternative initialization API that allows these values to be specified explicitly instead of having them computed? Or is there a reason not to do this?
+1.
There is a hack you can use nowadays, but a hack it truly is: fiddle
_environ before calling Py_Initialize().
--
- Jack Jansen
Jack Jansen writes:
There is a hack you can use nowadays, but a hack it truly is: fiddle _environ before calling Py_Initialize().
Yes, but this is pretty ugly, and to get full coverage involves messing with PATH as well as PYTHONHOME et al. In the case of embedding I would expect that would usually does *not* want the standard search path to be constructed, or minimally only part of it. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
Jack Jansen
On vrijdag, aug 15, 2003, at 18:37 Europe/Amsterdam, Tom Emerson wrote:
Currently the module search path (and the related directory names set as a side effect) is determined by looking at the environment that the executable calling Py_Initialize() is running in. Hence if I've embedded Python 2.3 and also have Python 2.3 installed in (say /usr/local) it is going to use the Python paths in /usr/local/ over those in my customized embedded version.
As far as I can tell, the only way I can control this behavior is to rewrite Py_GetPath and friends in my custom build.
In my case the user of my application has a configuration file which specifies the pathnames for platform (in-)dependent files, both Python and other. But I cannot pass this information on to Py_Initialize() and on into Py_GetPath.
Is it worth providing an alternative initialization API that allows these values to be specified explicitly instead of having them computed? Or is there a reason not to do this?
+1.
Wouldn't a Py_SetPath function do the trick, which would initially set module_search_path (if it's not already set)? Thomas
Thomas Heller writes: [...]
Is it worth providing an alternative initialization API that allows these values to be specified explicitly instead of having them computed? Or is there a reason not to do this?
Wouldn't a Py_SetPath function do the trick, which would initially set module_search_path (if it's not already set)?
Yes, that is essentially what I propose adding: void Py_SetPaths(modulepath, prefix, execprefix, fullpath); It needs to do more than just set module_search_path though, since as a side effect of determining module_search_path, calculate_path() also sets the values for prefix, exec_prefix, and prog_path which are returned by Py_GetPrefix, Py_GetExecPrefix, and Py_GetProgramFullPath respectively, which are used by for sys.prefix and sys.exec_prefix, and sys.executable. Py_GetExecPrefix is also used by the _PyPopen function in posixmodule.c. In any event, the idea is that an embedding application may know the values for each of these paths and can force them by calling Py_SetPaths prior to Py_Initialize, with the appropriate caveat that the caller better know what they are doing or strange things might happen. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
On maandag, aug 18, 2003, at 18:40 Europe/Amsterdam, Tom Emerson wrote:
Thomas Heller writes: [...]
Is it worth providing an alternative initialization API that allows these values to be specified explicitly instead of having them computed? Or is there a reason not to do this?
Wouldn't a Py_SetPath function do the trick, which would initially set module_search_path (if it's not already set)?
Yes, that is essentially what I propose adding:
void Py_SetPaths(modulepath, prefix, execprefix, fullpath);
Note that if we're going to tackle this I think we should also have a
way to disable
the other code that looks at the environment to set the various flags.
--
- Jack Jansen
Jack Jansen writes:
void Py_SetPaths(modulepath, prefix, execprefix, fullpath);
Note that if we're going to tackle this I think we should also have a way to disable the other code that looks at the environment to set the various flags.
I think you get this for free because calculate_path is never called when module_search_path is set. tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
Tom Emerson
Jack Jansen writes:
void Py_SetPaths(modulepath, prefix, execprefix, fullpath);
Note that if we're going to tackle this I think we should also have a way to disable the other code that looks at the environment to set the various flags.
I think you get this for free because calculate_path is never called when module_search_path is set.
IMO Jack meant variables like Py_OptimizeFlag and Py_DebugFlag which are set at the beginning of Py_Initialize from some env vars. Thomas
On Tuesday, August 19, 2003, at 05:00 PM, Tom Emerson wrote:
Jack Jansen writes:
void Py_SetPaths(modulepath, prefix, execprefix, fullpath);
Note that if we're going to tackle this I think we should also have a way to disable the other code that looks at the environment to set the various flags.
I think you get this for free because calculate_path is never called when module_search_path is set.
Nope... Look at the top of Py_Initialize() in pythonrun.c. It's
looking at all the environment variables itself.
--
Jack Jansen,
Thomas Heller writes:
IMO Jack meant variables like Py_OptimizeFlag and Py_DebugFlag which are set at the beginning of Py_Initialize from some env vars.
Indeed, and I agree. To maintain compatibility perhaps a new initialization function should be created that takes the argument, and Py_Initialize() calls this. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
As I look at Py_Initialize() further, I see some other 'features' that could be problematic when embedding: particularly the calls to Py_FatalError. An embedding application may be able to continue even if the Python interpreter cannot be initialized... certainly it should be up to the embedding application on how to handle the error, instead of having abort() called for it. It would also be nice if there were no calls fprintf and friends within the initialization path when doing embedded initialization. tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
As I look at Py_Initialize() further, I see some other 'features' that could be problematic when embedding: particularly the calls to Py_FatalError. An embedding application may be able to continue even if the Python interpreter cannot be initialized... certainly it should be up to the embedding application on how to handle the error, instead of having abort() called for it.
Unclear. You cannot completely avoid the possibility of Py_FatalError() being called in Python. The Py_FatalError() calls in Py_Initialize() are no different than the ones elsewhere in Python: they are only expected when you run out of memory in this stage. What might be useful would be a way for an embedding app to "hook" Py_FatalError() though, so that the embedding app can direct the error message to its own logging stream.
It would also be nice if there were no calls fprintf and friends within the initialization path when doing embedded initialization.
Right. Apart from a few inside debug #ifdefs, I see one or two that might realistically come up, and that should be fixed. In particular the line fprintf(stderr, "python: Can't reopen .pyc file\n"); should probably be replaced by a call to PySys_WriteStderr(). I'm not sure what to do about the line fprintf(stderr, "lost sys.stderr\n"); because if sys.stderr can't be found, there's no other place to send an error. Maybe in addition to a hookable Py_FatalError() we should also make PySys_WriteStderr() hookable. Of course it should always first try sys.stderr, but if that fails it should fall back to the hook rather than to stdio's stderr. --Guido van Rossum (home page: http://www.python.org/~guido/)
Hi, my name is Martin Zarate, and i'm working on a 3d game engine for educational and urban visualization purposes. Our engine handles scripting with an embedded Python interpreter (we designed our own customized class structure, threading system, etc). As of yet, we've never had to actually modify the Python interpreter itself, so I'm loathe to start. Our chief concern is this: our engine is designed with extensibility in mind - it detects plugins of new objects and new code entering the system. This code may or may not be trusted, and rexec is dead. That's a problem. I realize rexec will not be coming back. I don't need full rexec, I have a much simpler requirement - I don't want the python interpreter to have access to the system. The embedding app (Daedalus) handles feeding in of modules and content through Py_CompileString and PyImport_ExecCodeModule, as well as building local namespaces in which the code is run. Any access to the embedding system is through custom data types and extension modules. My point is that none of the system builtins or major modules are used - and those builtins and modules are what allow the user to access and corrupt the system. While much of the builtins are still needed (basic data types, etc) most of the built-in functions such as filesystem and systemcalls are liabilities. They could play with the file system, manipulate the system, and do other things. So, my question is this: is there any way to compile Python as a true standalone? That is, the only access to the system is through extension modules? I can't find any documentation on how to control what builtin modules and functions are compiled in with Python. Is there any interest in such a project? Or, if I develop this myself (although I have no idea how secure it could be - I don't know the builtins very well) would be any interest in makign a patch/PEP of it? This sort of thing would be a boon to anyone embedding python. I believe many embedded apps could use this sort of feature (at the very least to keep the bloat down). Sincerely, Martin
I realize rexec will not be coming back. I don't need full rexec, I have a much simpler requirement - I don't want the python interpreter to have access to the system. The embedding app (Daedalus) handles feeding in of modules and content through Py_CompileString and PyImport_ExecCodeModule, as well as building local namespaces in which the code is run. Any access to the embedding system is through custom data types and extension modules.
My point is that none of the system builtins or major modules are used - and those builtins and modules are what allow the user to access and corrupt the system. While much of the builtins are still needed (basic data types, etc) most of the built-in functions such as filesystem and systemcalls are liabilities. They could play with the file system, manipulate the system, and do other things. So, my question is this: is there any way to compile Python as a true standalone? That is, the only access to the system is through extension modules? I can't find any documentation on how to control what builtin modules and functions are compiled in with Python.
Is there any interest in such a project? Or, if I develop this myself (although I have no idea how secure it could be - I don't know the builtins very well) would be any interest in makign a patch/PEP of it? This sort of thing would be a boon to anyone embedding python. I believe many embedded apps could use this sort of feature (at the very least to keep the bloat down).
Well, in standard Python, the only access to the system is *also* through extension modules -- if you count __builtin__ as an extension module. The other extension module you want to avoid is the posix module (under Windows, the nt module). It should be a simple matter to remove this from your module search path. If you are right that you don't need access to the few builtins that can do system calls at all (I think it's just open and file, but you may want to check), you can simply delete them from the __builtin__ module at the start. I would delete remove as well, since remove(__builtin__) brings deleted builtins back to life. And you'd have to provide an __import__ replacement that restricts what you can import; but again you can do that at the start, before running any untrusted code. Is this clear, or do you need more explanation? (PS: sorry for the empty email I sent you before. My fingers slipped.) --Guido van Rossum (home page: http://www.python.org/~guido/)
On Monday, August 25, 2003, at 06:36 AM, Guido van Rossum wrote:
Well, in standard Python, the only access to the system is *also* through extension modules -- if you count __builtin__ as an extension module. The other extension module you want to avoid is the posix module (under Windows, the nt module). It should be a simple matter to remove this from your module search path.
No, it isn't: simply doing "open = type(sys.stdout)" will revive open
for you. So you'd really have to make sure no file objects are
accessible
either. And there's lots more loopholes like this.
With the current type system I think the only real solution would be
to block this at a very low level, i.e. removing file objects from your
build, or at least completely disabling their side-effects.
--
Jack Jansen,
Jack Jansen wrote:
On Monday, August 25, 2003, at 06:36 AM, Guido van Rossum wrote:
Well, in standard Python, the only access to the system is *also* through extension modules -- if you count __builtin__ as an extension module. The other extension module you want to avoid is the posix module (under Windows, the nt module). It should be a simple matter to remove this from your module search path.
No, it isn't: simply doing "open = type(sys.stdout)" will revive open for you. So you'd really have to make sure no file objects are accessible either. And there's lots more loopholes like this.
With the current type system I think the only real solution would be to block this at a very low level, i.e. removing file objects from your build, or at least completely disabling their side-effects.
FWIW, Zope takes an approach to restricted Python code that's worth considering. We once thought rexec and Bastion would eventually supercede Zope's "RestrictedPython" package, so not a lot of effort went into non-Zope-specific documentation. However, RestrictedPython has outlived both rexec and Bastion, so maybe detailed documentation would now be valuable. Here is a general overview of the approach RestrictedPython takes: - All builtins and modules are guilty until proven innocent. Restricted modules have a special __builtins__ and an __import__ hook. - We use a modified compiler, based on the now-standard compiler module, to prevent exec statements and hook print statements. The compiler also adds hooks for getattr, setattr, delattr, getitem, setitem, and delitem operations. Augmented assignment is disallowed (too complicated to support.) - The type() builtin is considered unsafe. It opens a big unknown. However, a same_type() builtin is provided, which is close enough for most purposes. There are safe equivalents for other builtins as well. - Here's the hard one for some people to swallow: the compiler prevents restricted scripts from using names that start with an underscore. Being able to define a name like "__import__" could get around the hooks. This might be considered draconian, but no one has spotted any holes yet in the safety net, and the benefit of being able to script in Python outweighs the losses. It doesn't implement resource limitations, like preventing scripts from eating up all available RAM or simply never terminating. True resource limitations would require running scripts in a separate process. RestrictedPython is also a boring name. However, RestrictedPython is safer than anything else we know of in the Python world. Shane
participants (6)
-
Guido van Rossum
-
Jack Jansen
-
Martin Zarate
-
Shane Hathaway
-
Thomas Heller
-
Tom Emerson