Changing strings in files
Chris Angelico
rosuav at gmail.com
Tue Nov 10 13:40:56 EST 2020
On Wed, Nov 11, 2020 at 5:36 AM Eli the Bearded <*@eli.users.panix.com> wrote:
> Read first N lines of a file. If all parse as valid UTF-8, consider it text.
> That's probably the rough method file(1) and Perl's -T use. (In
> particular allow no nulls. Maybe allow ISO-8859-1.)
>
ISO-8859-1 is basically "allow any byte values", so all you'd be doing
is checking for a lack of NUL bytes. I'd definitely recommend
mandating UTF-8, as that's a very good way of recognizing valid text,
but if you can't do that then the simple NUL check is all you really
need.
And let's be honest here, there aren't THAT many binary files that
manage to contain a total of zero NULs, so you won't get many false
hits :)
ChrisA
More information about the Python-list
mailing list