Paul Moore writes:
Add to this the fact that I *know* I've seen supposed text files with mixed encoding content,
Heck, I've seen *file names* with mixed encoding content.
and no-one has *ever* explained how to handle that (it's basically a damaged file, and so all the "right way to deal with Unicode" discussions ignore it)
The right way to handle such a file is ad hoc: operate on the features you can identify, and treats runs of bytes of unknown encoding as atomic blobs. In practice, there is a generic such feature that supports many applications: runs of ASCII text. Which is the intuition all the pragmatists start with -- it's correct.
OK, so maybe I do feel somewhat insulted...
I'm sorry you feel that way. (I've sided with the pragmatists in this thread, but on this issue I'm a purist at heart.)