Aug. 12, 2013
2:32 p.m.
On 08/12/2013 03:52 PM, Masklinn wrote:
On 2013-08-12, at 15:42 , Philipp A. wrote:
well, the only remotely valid thing to do is to test if the input data is decodable with any of the encodings python knows.
Most iso-8859 parts can decode any byte (and thus any byte sequence).
Are you sure about the null byte? '\0' But yes, just looking if there is a '\0' in the file isn't a good heuristic either.
Parts 3, 6, 7, 8 and 11 are the only ones not to be defined across all of the [128, 255] range (they're ascii extensions so the [0, 127] range is identical to ascii in all iso-8859 parts)