Security implications of using open() on untrusted strings.
aioe.org at technicalbloke.com
Tue Nov 25 08:26:32 CET 2008
Jorgen Grahn wrote:
> On Mon, 24 Nov 2008 00:44:45 -0500, r0g <aioe.org at technicalbloke.com> wrote:
>> Hi there,
>> I'm trying to validate some user input which is for the most part simple
>> regexery however I would like to check filenames and I would like this
>> code to be multiplatform.
>> I had hoped the os module would have a function that would tell me if a
>> proposed filename would be valid on the host system but it seems not. I
>> have considered whitelisting but it seems a bit unfair to make the rest
>> of the world suffer the naming restrictions of windows. Moreover it
>> seems both inelegant and hard work to research the valid file/directory
>> naming conventions of every platform that this app could conceivably run
>> on and write regex's for all of them so...
>> I'm tempted to go the witch dunking route, stick it in an open() between
>> a Try: & Except: and see if it floats. However...
>> Although it's a desktop (not internet facing) app I'm a little squeamish
>> piping raw user input into a filesystem function like that and this app
>> will be dealing with some particularly sensitive data so I want to be
>> careful and minimize exposure where practical.
> Take the Unix 'ls' command (or MS-DOS 'dir'). That's two programs
> which let users pipe raw input into the filesystem functions, and they
> certainly have handled some very sensitive data over the years.
>> Has programming PHP and Web stuff for years made me overly paranoid
>> about this [...]
> Yes. ;-)
> Please explain one thing: what are you looking for? It's not
> "accesses a file outside the user's home directory", "accesses an
> infinite file like /dev/zero" or something like that, or you would
> have said so. Nor seems the "user" input come from some other user
> than the one your program is running as, nor from some input source
> which the user cannot be held responsible for.
> Seems to me you simply want to know beforehand that the reading will
> work. But you can never check that! You can stat(2) the file, or
> open-and-close it -- and then a microsecond later, someone deletes the
> file, or replaces it with another one, or write-protects it, or mounts
> a file system on top of its directory, or drops a nuke over the city,
> or ...
> Two more notes:
> - os.open is not like os.system. If os.open ends up doing
> anything other than trying to open the file corresponding to the
> string you feed it, it's Python's fault, not yours.
> Compare with a language (does Perl allow this?) where if the string
> is "rm -rf /|", open will run "rm -rf /" and start reading its output.
> *That* interface would have been
> - if the OS ends up doing something different when calling open(2) or
> creat(2) or whatever using that string, it's the OSes fault, not
> Or am I missing something?
No Jorgen, that's exactly what I needed to know i.e. that sending
unfiltered text to open() is not negligent or likely to allow any
badness to occur.
As far as what I was looking for: I was not looking for anything in
particular as I couldn't think of any specific cases where this could be
a problem however... my background is websites (where input sanitization
is rule number one) and some of the web exploits I've learned to
mitigate over the years aren't ones I would have necessarily figured out
for myself i.e. CSRF So I thought I'd ask you guys in case there's
anything I haven't considered that I should consider! Thankfully it
seems I don't have too much to worry about :-)
The only situation where I can forsee potential for mischief is if the
program, or part thereof, is running as a more privileged user than the
user it is accepting input from. Thankfully I don't think that will be
necessary in the prog I'm working on right now as I don't need packet
capture / low numbered ports etc.
Thanks for your answer and thanks to everybody else for all their
More information about the Python-list