[Distutils] setup script sandboxing (was Re: EasyInstall: verbosity)
Phillip J. Eby
pje at telecommunity.com
Mon May 30 07:40:40 CEST 2005
At 12:09 AM 5/30/2005 -0500, Ian Bicking wrote:
>At least, that's my initial thought. Potentially you could have two
>progress indicators, one over-all progress, and another for the current job.
I was thinking that for download progress, I'd use whatever urllib already
does. :) However for other things, you'd probably be looking at log
messages to get some idea of the progress.
>>By the way, the sandboxing feature is where I plan to replace a few key
>>'os' module functions and builtins with ones that "notice" if the setup
>>script is trying to modify files outside the installer's temporary
>>directory. Initially this will just be so I can analyze those scripts'
>>behavior, but it might eventually grow into a facility to make them think
>>they're modifying files in the real filesystem, but they would actually
>>be getting redirected to the right temporary subdirectory for stuff to be
>>added into the egg. I don't know if this feature is really achievable,
>>but most all file access in Python boils down to either 'open()' or a
>>call to an 'os' function imported from the platform-specific extension
>>('nt', 'mac', 'posix', etc.).
>>Anyway, arguably people should "fix" their packages, but in practice some
>>folks won't, so a working sandbox could be helpful. But I'll probably
>>want the sandbox tools to do a lot of output at various levels of detail,
>>hence why I'd probably use the logging package to do this.
>
>Yikes, that sounds complicated. Though sandboxing would be kind of neat
>in general, e.g., for mock filesystems in unit tests. Are there
>particular packages you have in mind that need this?
Yes. I've been finding that a surprising number manage to install stuff in
e.g. site-packages "behind my back". Some are mentioned on the EasyInstall
experiences page, but once I actually implemented a prototype sandbox, I
discovered that some of the packages I thought were clean, were in fact
installing stuff behind my back!
Anyway, the virtualization is fairly uncomplicated if all you want to do is
to note when a package is being naughty; I've already implemented that in
my working copy. Redirecting the paths to a sane location is more complex;
I may have to move the virtualizer into bdist_egg and apply it there.
Essentially, what we want to do is that during the main setup script, we
don't want to allow it to write anything outside the temp directory, and
fail the script as an "unsafe package" (a nice bit of FUD to help shame
package authors into cleaning up that sort of uncontrolled installation
activity). For writes occurring during bdist_egg's invocation of
install_data and install_lib, writes to site-packages should be redirected
to bdist_egg's staging area, and all other writes outside the setup script
directory should be a fatal "unsafe package" abort.
Anyway, virtualization is surprisingly easy to implement; all of the
Python-supplied filesystem APIs are grounded in the 'os' module, and the
'open'/'file' builtin. If you replace these two, you've pretty much got it
made. Even os.path functions like 'isdir()' and such work by calling
os.stat(), so replacing it in os means you're good to go. And, there are
fewer than 30 os primitives in all to replace, with only four distinct
input/output signatures. A little metaprogramming goes a long way here,
such that AbstractSandbox is only about 100 lines, and a simple
LoggingSandbox class is about 20 more lines.
To use these sandboxes for unit tests would probably be a pain, though; I'm
focused only on either disallowing actions entirely, or changing what
directory they happen in, sort of like a Python-only pseudo-chroot(). A
unit test system would probably want to virtualize all the operations,
which means more operations to virtualize, and with more signatures.
More information about the Distutils-SIG
mailing list