How to know if a file is a text file
philip at semanchuk.com
Sat Nov 14 18:51:30 CET 2009
On Nov 14, 2009, at 11:02 AM, Luca Fabbri wrote:
> Hi all.
> I'm looking for a way to be able to load a generic file from the
> system and understand if he is plain text.
> The mimetype module has some nice methods, but for example it's not
> working for file without extension.
You have to define what you mean by "text" file. It might seem
obvious, but it's not.
Do you mean just ASCII text? Or will you accept Unicode too? Unicode
text can be more difficult to detect because you have to guess the
file's encoding (unless it has a BOM; most don't).
And do you need to verify that every single byte in the file is
"text"? What if the file is 1GB, do you still want to examine every
If you give us your own (specific!) definition of what "text" means,
or perhaps a description of the problem you're trying to solve, then
maybe we can help you better.
More information about the Python-list