On 2/16/2012 7:59 AM, Paul Moore wrote:
Add to this the fact that I *know* I've seen supposed text files with mixed encoding content, and no-one has *ever* explained how to handle that (it's basically a damaged file,
Before unicode, mixed encodings was the only was to have multi-lingual digital text (with multiple symbol systems) in one file. I presume such texts used some sort of language markup like <English>, <Hindi> (or <Sanskrit>), and <Tibetan>, along with software that understood the markup. Such files were not broken, just the pre-unicode system of different codes for each language or nation. To handle such a file, the program, whatever the language, has to understand the custom markup, segment the bytes, and handle each segment appropriately. Crazy text that switches among unknown encodings without notice is a possibly unsolvable decryption problem. Such have no guaranteed algorithms, only heuristics. -- Terry Jan Reedy