[Distutils] install w/o build, spaces in directory names

Greg Ward gward@python.net
Thu, 10 Feb 2000 21:19:11 -0500

On 10 February 2000, David Ascher said:
> 1) While building Numerical on NT last night w/ the latest Distutils CVS
> snapshot, I found that it refused to build because the directory names had
> spaces in them which were not quoted. In other words, the code being
> executed when something like
> cl .... -IC:\Program Files\Python...
> which broke.  To correct it, I patched ccompiler.py and msvccompiler.py to
> turn those kinds of things into
> cl .... -I"C:\Program Files\Python..." ...
> but Greg says that that will break on Unix -- I'm surprised by that claim,
> since IIRC quoting directory names is OK, but it's been a while since I
> really dealt with Unix seriously.  FWIW, so far the problem occurs for -I in
> ccompiler.py and /LIBPATH in msvccompiler.py.  The patch is listed at the
> end of this message.

Here's the explanation that I didn't give David this morning in private
email (I was at work, where I'm not paid to hack on Distutils; and I
figured this deserved a public airing if it's a portability problem): in
UnixCCompiler, I do stuff like this several times (ie. any time I ask
the compiler or linker or archiver to compile, link, or archive

            self.spawn ([self.cc] + cc_args +
                        [source, '-o', object] +

MSVCCompiler has similar code:

            cc_args = compile_options + \
                      base_pp_opts + \
                      [outputOpt, inputOpt]
            self.spawn ([self.cc] + cc_args)

(The two could be a lot more similar if I went crusading through the
code to enforce uniformity, which I'll do as soon as that next shipment
of round tuits arrives.)

'self.spawn()' in both cases is a convenience method provided by the
base class, CCompiler; it's just a wrapper around
'distutils.spawn.spawn()', which in turn is a wrapper around either
'_spawn_posix()' or '_spawn_nt()'.  Both of these have one required
argument, which is a *list of command-line arguments*.  '_spawn_posix()'
forks and, in the child, 'exec()'s that list of command-line arguments.
Key point: no shell is ever involved, so no quoting need be done;
indeed, no quoting *can* be done, as the command-line must be split up
into a (Python) list of arguments before you can do anything with it.
This is a feature, as anyone who has ever done shell or Tcl programming
can attest.  (For those who have not, both of those languages have very
fuzzy notions of the difference between strings and lists, and rather
weird quoting rules.  Compared to quoting strings and lists in shell or
Tcl, Perl is a model of clarity.  Python is a smidge clearer still than
Perl, but not quite as flexible.  But I digress.)

Bottom line: on Unix, the above 'spawn()' call resolves to:

    pid = fork ()
    if pid == 0
        exec (self.cc,
              [self.cc] + cc_args +
              [source, '-o', object] +

No shell, no quoting.  (The duplication of 'self.cc' is deliberate --
it's an obscure feature of 'exec()' that Python faithfully duplicates
that allows parents to lie about the name of their children.  My
'spawn()' interface doesn't expose this feature.)

The picture on Windows is a bit different, since of course there's no
such thing as fork-and-exec.  '_spawn_nt()' just uses 'spawnv()', so the 
above code from MSVCCompiler resolves to:

    executable = search_path (cmd[0])        # search_path not shown here
    spawnv (os.P_WAIT, executable, cmd)

I have long suspected that there are some command-line quoting rules in
the version of DOS underlying current versions of Windows, but I have
never been able to discern them.  If someone would care to offer up an
explanation to this confused Unix hacker (or a pointer to an
explanation), I'd be glad to hear it.

Anyways, on reflection it looks like the place to make the fix is in
spawn.py, specifically in '_spawn_nt()'.  In particular, somewhere
before the call to 'os.spawnv()' should be a loop like this:

    for i in range (len (cmd)):
        if needs_quoting (cmd[i]):
            cmd[i] = quote (cmd[i])

I'm sure the implementation will need to be a bit more elaborate than
this.  For instance, how do you quote a string that already has quotes
in it?  I can't fathom a guess, since I don't know the quoting rules on
Windows.  I hope someone can contribute a patch.  For inspiration, here
is the equivalent for Unix in Perl.  No doubt similar logic, but
different details, will apply.

=item shellquote (WORDLIST)

Performs the opposite of the F<Text::ParseWords> module, namely it joins
an array of words together, with some sub-strings quoted in order to
escape shell meta-characters.  WORDLIST should just be a list of
substrings, not a list reference.  This is useful for turning a list of
arguments (such as C<@ARGV>, or something you're about to pass to Perl's
C<system>) into a string that looks like what you might type to the

The exact rules are as follows: if a word contains no metacharacters and
is not empty, it is untouched.  If it contains both single and double
quotes (C<'> and C<">), all meta-characters are escaped with a
backslash, and no quotes are added.  If it contains just single quotes,
it is encased in double quotes.  Otherwise---that is, if it is empty or
contains meta-characters other than C<'>---it is encased in single

The list of shell meta-characters is taken from the Perl source code
(C<do_exec()>, in doio.c), and thus is specific to the Bourne shell:

   $ & * ( ) { } [ ] ' " ; \ | ? < > ~ ` \n

(plus whitespace).

For example, if C<@ARGV> is C<("foo", "*.bla")>, then
C<shellquote (@ARGV)> will return C<"foo '*.bla'">---thus turning a
simple list of arguments into a string that could be given to the shell
to re-generate that list of arguments.


sub shellquote
   my (@words) = @_;
   local $_;
   for (@words)
      # This list of shell metacharacters was taken from the Perl source
      # (do_exec(), in doio.c).  It is, in slightly more readable form:
      #    $ & * ( ) { } [ ] ' " ; \ | ? < > ~ ` \n
      # (plus whitespace).  This totally screws up cperl-mode's idea of
      # the syntax, unfortunately, so don't expect indenting to work
      # at all in the rest of this function.

      if ($_ eq "" || /[\s\$\&\*\(\)\{\}\[\]\'\";\\\|\?<>~`\n]/)
         # If the word has both " and ' in it, then just backslash all 
         #   metacharacters;
         # if it has just ' then encase it in "";
         # otherwise encase it in ''

            (s/([\s\$\&\*\(\)\{\}\[\]\'\";\\\|\?<>~`\n])/\\$1/g, last SUBST)
               if (/\"/) && (/\'/);
            ($_ = qq/"$_"/, last SUBST) if (/\'/);
            $_ = qq/'$_'/;

   join (" ", @words);

...of course, the version in Python for 'spawn()' would return the
modified list of strings, rather than joining them together into one big
string.  (The purpose of my Perl function was to print out a script's
command-line arguments in a way that could be replicated in future: if
you just print out join (" ", @ARGV) -- or join (sys.argv, ' '), choose
yer poison -- then you lose vital information.

> 2) Again on Windows, if I do
> 	python setup.py build
> 	python setup.py install
> the second line causes the code to be recompiled, even though the
> compilation is not needed.  Greg assures me that that is not the case on
> Unix.  Anyone have the time to tackle this?

Aughh!  Again David, I apologize for brushing you off this morning
without looking at the code; as I do so now, I see that there is *no*
provision in MSVCCompiler for doing the kind of timestamp checking that
is done in UnixCCompiler to avoid redundant compilation.  So, to correct
my arrogant assertion: that feature was not there in the beginning, it
isn't there now, and it won't be until someone writes the 10 or 15 lines
of needed code.  This shouldn't be too hard, but I'm off for a weekend
skiing in West Virginia tomorrow (woo-hoo!) so I won't be able to do it
until Monday.

Anyways, it looks like the priorities for Windows portability are:

   * fix '_spawn_nt()' to quote command-line arguments that need quoting
     (bad bug, breaks builds)

   * finish the job of incorporating Robin and Thomas' registry-
     grovelling patch (this comes first only because we're sort of
     in the middle of it)
     (bad omission, totally prevents builds in certain cases)

   * fix MSVCCompiler so it doesn't do redundant compilation/linking
     (minor bug, just wastes time) (but potentially lots of time
     -- NumPy is a lot of code to compile!)

And again, I'm going to be away for three days, and I do not have the
required knowledge to do the quoting fix anyways.  As usual, Patches
happily accepted!

Thanks for the bug reports --

Greg Ward - just another /P(erl|ython)/ hacker          gward@python.net
BE ALERT!!!!  (The world needs more lerts ...)