check if file is MS Word or PDF file

Chris Rebert clp at rebertia.com
Sat Sep 27 19:01:14 EDT 2008


On Sat, Sep 27, 2008 at 3:42 PM, Michael Crute <mcrute at gmail.com> wrote:
> On Sat, Sep 27, 2008 at 5:43 PM, A. Joseph <joefazee at gmail.com> wrote:
>> What should I look for in a file to determine whether or not it is a
>> MS Word file or an Excel file or a PDF file, etc., etc.? including Zip
>> files
>>
>> I don`t want to check for file extension.
>> os.path.splitext('Filename.jpg') will produce a tuple of filename and
>> extension, but some file don`t even have extension and can still be read by
>> MS Word or NotePad. i want to be 100% sure of the file.
>
> You could use the mimetypes module...
>
> <<< import mimetypes
> <<< mimetypes.guess_type("LegalNotices.pdf")
>>>> ('application/pdf', None)

Looking at the docs for the mimetypes module, it just guesses based on
the filename (and extension), not the actual contents of the file, so
it doesn't really help the OP, who wants to make sure their program
isn't misled by an inaccurate extension.

Regards,
Chris
-- 
Follow the path of the Iguana...
http://rebertia.com

>
> -mike
>
> --
> ________________________________
> Michael E. Crute
> http://mike.crute.org
>
> God put me on this earth to accomplish a certain number of things.
> Right now I am so far behind that I will never die. --Bill Watterson
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list