Chris Angelico writes:
Can anyone give an example of a current in-use system encoding that would have [ASCII bytes in non-ASCII text]?
Shift JIS, Big5. (Both can have bytes < 128 inside multibyte characters.) I don't know if Big5 is still in use as the default encoding anywhere, but Shift JIS is, although it's decreasing. For both of those once you encounter a non-ASCII byte you can just switch over, and none of the previous text was mis-decoded. But that's only if you *know* the language was Japanese (respectively Chinese). Remember, there is no encoding that can be distinguished from ISO 8859-1 (and several other Latin encodings) simply based on the bytes found, since it uses all 256 bytes.
How likely is it that you'd get even one line of text that purports to be ASCII?
Program source code where the higher-level functions (likely to contain literal strings) come late in the file are frequently misdetected based on the earlier bytes. Steve