Determining when a file is an Open Office Document
rridge at csclub.uwaterloo.ca
Sat Jan 20 00:12:08 EST 2007
Ross Ridge wrote:
> So identifying PDF files is pretty easy.
Steven D'Aprano wrote:
> Sure. MIS-identifying PDF files is pretty easy. Identifying them is not.
> Consider this example:
Your contrived example doesn't show how a PDF file would be
misidentified, it only shows how a file deliberately made to look like
PDF file would be "misidentified". Since that was the intent of
crafting such a file, I don't see the problem.
> Is there a security vulnerability buried in the detection of file types by
> magic bytes? I don't know, but I wouldn't be surprised if there were.
There's only a security vulnerability if you choose to trust a file
based on it's assumed file type. Since PDF files generally aren't
trusted, it's not likely to be an issue for whatever application tubby
has in mind.
>Any file system that doesn't have file type metadata is reduced to
>guessing the type of the file, and guesses can be wrong.
File type metadata can also be wrong. You can give any file a .PDF
extension and Windows will believe it's a PDF file. On Mac OS if file
has a signature "CARO"/"PDF ", it will believe it's a PDF file
regardless of it's contents. Metadata doesn't make programs any less
vulnerable to deliberate attempts to fool them.
More information about the Python-list