How to know if a file is a text file
Philip Semanchuk
philip at semanchuk.com
Sat Nov 14 12:51:30 EST 2009
On Nov 14, 2009, at 11:02 AM, Luca Fabbri wrote:
> Hi all.
>
> I'm looking for a way to be able to load a generic file from the
> system and understand if he is plain text.
> The mimetype module has some nice methods, but for example it's not
> working for file without extension.
Hi Luca,
You have to define what you mean by "text" file. It might seem
obvious, but it's not.
Do you mean just ASCII text? Or will you accept Unicode too? Unicode
text can be more difficult to detect because you have to guess the
file's encoding (unless it has a BOM; most don't).
And do you need to verify that every single byte in the file is
"text"? What if the file is 1GB, do you still want to examine every
single byte?
If you give us your own (specific!) definition of what "text" means,
or perhaps a description of the problem you're trying to solve, then
maybe we can help you better.
Cheers
Philip
More information about the Python-list
mailing list