escaping characters in filenames

J Kenneth King james at agentultra.com
Wed Jul 29 16:33:11 EDT 2009


Nobody <nobody at nowhere.com> writes:

> On Wed, 29 Jul 2009 09:29:55 -0400, J Kenneth King wrote:
>
>> I wrote a script to process some files using another program.  One thing
>> I noticed was that both os.listdir() and os.path.walk() will return
>> unescaped file names (ie: "My File With Spaces & Stuff" instead of "My\
>> File\ With\ Spaces\ \&\ Stuff").  I haven't had much success finding a
>> module or recipe that escapes file names and was wondering if anyone
>> could point me in the right direction.
>> 
>> As an aside, the script is using subprocess.call() with the "shell=True"
>> parameter.  There isn't really a reason for doing it this way (was just
>> the fastest way to write it and get a prototype working).  I was
>> wondering if Popen objects were sensitive to unescaped names like the
>> shell.  I intend to refactor the function to use Popen objects at some
>> point and thought perhaps escaping file names may not be entirely
>> necessary.
>
> Note that subprocess.call() is nothing more than:
>
> 	def call(*popenargs, **kwargs):
> 	    return Popen(*popenargs, **kwargs).wait()
>
> plus a docstring. It accepts exactly the same arguments as Popen(), with
> the same semantics.
>
> If you want to run a command given a program and arguments, you
> should pass the command and arguments as a list, rather than trying to
> construct a string.
>
> On Windows the value of shell= is unrelated to whether the command is
> a list or a string; a list is always converted to string using the
> list2cmdline() function. Using shell=True simply prepends "cmd.exe /c " to
> the string (this allows you to omit the .exe/.bat/etc extension for
> extensions which are in %PATHEXT%).
>
> On Unix, a string is first converted to a single-element list, so if you
> use a string with shell=False, it will be treated as the name of an
> executable to be run without arguments, even if contains spaces, shell
> metacharacters etc.
>
> The most portable approach seems to be to always pass the command as a
> list, and to set shell=True on Windows and shell=False on Unix.
>
> The only reason to pass a command as a string is if you're getting a
> string from the user and you want it to be interpreted using the
> platform's standard shell (i.e. cmd.exe or /bin/sh). If you want it to be
> interpreted the same way regardless of platform, parse it into a
> list using shlex.split().

I understand; I think I was headed towards subprocess.Popen() either
way.  It seems to handle the problem I posted about.  And I got to learn
a little something on the way.  Thanks!

Only now there's a new problem in that the output of the program is
different if I run it from Popen than if I run it from the command line.
The program in question is 'pdftotext'.  More investigation to ensue.

Thanks again for the helpful post.



More information about the Python-list mailing list