check if file is MS Word or PDF file

Chris Rebert clp at
Sun Sep 28 01:01:14 CEST 2008

On Sat, Sep 27, 2008 at 3:42 PM, Michael Crute <mcrute at> wrote:
> On Sat, Sep 27, 2008 at 5:43 PM, A. Joseph <joefazee at> wrote:
>> What should I look for in a file to determine whether or not it is a
>> MS Word file or an Excel file or a PDF file, etc., etc.? including Zip
>> files
>> I don`t want to check for file extension.
>> os.path.splitext('Filename.jpg') will produce a tuple of filename and
>> extension, but some file don`t even have extension and can still be read by
>> MS Word or NotePad. i want to be 100% sure of the file.
> You could use the mimetypes module...
> <<< import mimetypes
> <<< mimetypes.guess_type("LegalNotices.pdf")
>>>> ('application/pdf', None)

Looking at the docs for the mimetypes module, it just guesses based on
the filename (and extension), not the actual contents of the file, so
it doesn't really help the OP, who wants to make sure their program
isn't misled by an inaccurate extension.

Follow the path of the Iguana...

> -mike
> --
> ________________________________
> Michael E. Crute
> God put me on this earth to accomplish a certain number of things.
> Right now I am so far behind that I will never die. --Bill Watterson
> --

More information about the Python-list mailing list