Determining when a file is an Open Office Document

Ross Ridge rridge at
Sat Jan 20 00:12:08 EST 2007

Ross Ridge wrote:
> So identifying PDF files is pretty easy.

Steven D'Aprano wrote:
> Sure. MIS-identifying PDF files is pretty easy. Identifying them is not.
> Consider this example:

Your contrived example doesn't show how a PDF file would be
misidentified, it only shows how a file deliberately made to look like
PDF file would be "misidentified".  Since that was the intent of
crafting such a file, I don't see the problem.

> Is there a security vulnerability buried in the detection of file types by
> magic bytes? I don't know, but I wouldn't be surprised if there were.

There's only a security vulnerability if you choose to trust a file
based on it's assumed file type.  Since PDF files generally aren't
trusted, it's not likely to be an issue for whatever application tubby
has in mind.

>Any file system that doesn't have file type metadata is reduced to
>guessing the type of the file, and guesses can be wrong.

File type metadata can also be wrong.  You can give any file a .PDF
extension and Windows will believe it's a PDF file.  On Mac OS if file
has a signature "CARO"/"PDF ", it will believe it's a PDF file
regardless of it's contents.  Metadata doesn't make programs any less
vulnerable to deliberate attempts to fool them.

                                         Ross Ridge

More information about the Python-list mailing list