suggestions please "what should i watch for/guard against' in a file upload situation?"

Diez B. Roggisch deets at web.de
Wed Oct 6 18:51:44 EDT 2010


Seebs <usenet-nospam at seebs.net> writes:

> On 2010-10-06, Diez B. Roggisch <deets at web.de> wrote:
>> Seebs <usenet-nospam at seebs.net> writes:
>>> On 2010-10-06, geekbuntu <gmilby at gmail.com> wrote:
>>>> in general, what are things i would want to 'watch for/guard against'
>>>> in a file upload situation?
>
>>> This question has virtually nothing to do with Python, which means you
>>> may not get very good answers.
>
>> In contrast to "comp.super.web.experts"? There are quite a few people
>> with web-experience here I'd say. 
>
> Oh, certainly.  But in general, I try to ask questions in a group focused
> on their domain, rather than merely a group likely to contain people who
> would for other reasons have the relevant experience.  I'm sure that a great
> number of Python programmers have experience with sex, that doesn't make
> this a great newsgroup for sex tips.  (Well, maybe it does.)

As the OP asked about a Python web framework (self written or not), I
think all advice that can be given is certainly more related to Python
than to airy references to general web programming such as 
"oh, make sure if your server side application environment hasn't any 
security issues."

Or, to be more concrete: what NG would you suggest for frameworks or webapps
written in python to ask this question?

>> Given that most people are not computer savvy (always remember, the
>> default for windows is to hide extensions..), using it client-side can
>> be valuable to prevent long uploads that eventuall need to be rejected
>> otherwise (no mom, you can't upload word-docs as profile pictures).
>
> That's a good point.  On the other hand, there's a corollary; you may want
> to look at the contents of the file in case they're not really what they're
> supposed to be.

For sure. But the focus of you and others seems to be the file-name,
as if that was anything especially dangerous. Matter of factly, it's a
paramteter to a multipart/form-data encoded request body parameter
definition, and as such has a rather locked-down in terms of
null-bytes and such. So you are pretty safe as long as you

 - use standard library request parsing modules such as cgi. If 
   one instist on reading streams bytewise and using ctypes to poke the
   results into memory, you can of course provoke unimaginable havoc..

 - don't use the filename for anything but meta-info. And ususally, they
   are simply regarded as "nice that you've provided us with it, we try
   & make our best to fill an <img alt> attribute with the basename". 
   But not more. Worth pointing out to the OP to do that. But this is
   *not* a matter of mapping HTTP-request paths to directories I'd wager
   to say. 

Something that is of much more importance (I should have mentioned
earlier, shame on me) is of course file-size. Denying requests that come
with CONTENT_LENGTH over a specified limit, of course respecting
CONTENT_LENGTH and not reading beyond it, and possibly dealing with
chunked-encodings in similarily safe ways (I have to admit I haven't yet
dealt with  one of those myself on a visceral level - 
but as they are part of the HTTP-spec...) is important, 
as otherwise DOS attacks are possible.

>> Your strange focus on file-names that are pure meta information is a
>> little bit concerning... 
>
> If you're uploading files "into a directory", then it is quite likely that
> you're getting file names from somewhere.  Untrusted file names are a much
> more effective attack vector, in most cases, than EXIF information.

The "into a directory" quote coming from where? And given that EXIF
information is probably read by some C-lib, I'd say it is much more
dangerous. This is a gut feeling only, but fed by problems with libpng a
year or two ago.

>> Certainly advice. But that's less focussed on filenames or file-uploads, but
>> on the whole subject of processing HTTP-requestst. Which would make a
>> point for *not* using a home-grown framework.
>
> Well, yeah.  I was assuming that the home-grown framework was mandatory for
> some reason.  Possibly a very important reason, such as "otherwise we won't
> have written it ourselves".

In Python, it's usually more along the lines of "well, we kinda started,
and now we have it, and are reluctant to switch."

But of course one never knows...

Diez



More information about the Python-list mailing list