understand program used to create file

Jon Clements joncle at googlemail.com
Tue Nov 1 16:58:42 EDT 2011


On Nov 1, 7:27 pm, pacopyc <paco... at gmail.com> wrote:
> Hi, I have about 10000 files .doc and I want know the program used to
> create them: writer? word? abiword? else? I'd like develop a script
> python to do this. Is there a module to do it? Can you help me?
>
> Thanks

My suggestion would be the same as DaveA's.

This gives you the format it was *written* in.
(Saved a blank OO document as 95/97/XP Word DOC under Linux)

jon at forseti:~/filetest$ file *
saved-by-OO.doc: CDF V2 Document, Little Endian, Os: Windows, Version
1.0, Code page: -535, Author: jon , Revision Number: 0, Create Time/
Date: Mon Oct 31 20:47:30 2011

I'd be impressed if you could discover the program that did *write*
it; I'd imagine you'd need something that understood some meta-data in
the format (if the format has a kind of 'created_by' field, for
instance), or depend on nuisances which give away that a certain
program wrote data in another's native format.

Assuming the former, what might be possible:

1) Grab a "magic number" lookup list
2) Grab 8 (I think that should be all that's needed, but hey ummm..)
bytes from the start of each file
3) Look it up in the "magic number" list
4) If you got something great, if not compare 7, 6, 5, 4 bytes...
etc... until you get a hit or bail out

(Or just find a Windows port of 'file')

HTH

Jon.



More information about the Python-list mailing list