On Mon, Aug 12, 2013, at 10:32, Mathias Panzenböck wrote:
On 08/12/2013 03:52 PM, Masklinn wrote:
On 2013-08-12, at 15:42 , Philipp A. wrote:
well, the only remotely valid thing to do is to test if the input data is decodable with any of the encodings python knows.
Most iso-8859 parts can decode any byte (and thus any byte sequence).
Are you sure about the null byte? '\0' But yes, just looking if there is a '\0' in the file isn't a good heuristic either.
It depends on precisely what is meant by "iso-8859 parts" - and the same with any other character in 0-32 or 127-159 (there is nothing special about the null byte in this regard). But it's typical to think of "iso-8859" encodings as being more like IANA ISO-8859-1, which combines ISO/IEC 8859-1 with the control character definitions from ISO 6429.