Terry Reedy writes:
Before unicode, mixed encodings was the only was to have multi-lingual digital text (with multiple symbol systems) in one file.
There is a long-accepted standard for doing this, ISO 2022. IIRC it's available online from ISO now, and if not, ECMA 35 is the same. The X Compound Text standard (I think this is documented in the ICCCM) and the Motif Compound String are profiles of ISO 2022. If that is what Paul is seeing, then the iso-2022-jp codec might be good enough to decode the files he has, depending on which version of ISO-2022-JP is implemented. If not, iconv -f ISO-2022-JP-2 (or ISO-2022-JP-3) should work (at least for GNU's iconv implementation).
I presume such texts used some sort of language markup like <English>, <Hindi> (or <Sanskrit>), and <Tibetan>, along with software that understood the markup.
They would use encoding "markup" (specifically escape sequences). Language is not enough, as all languages have had multiple encodings since the invention of ASCII (or EBCDIC, whichever came second ;-), and in many cases multilingual standards have evolved (Japanese, for example, includes Greek and Cyrillic alphabets in its JIS standard coded character set). More recently, many languages have several ISO 2022-based encodings (the ISO 8859 family is a conformant profile of ISO 2022, as are the EUC encodings for Asian languages; the Windows 125x code pages are non-conformant extensions of ASCII based on ISO 8859).
Crazy text that switches among unknown encodings without notice is a possibly unsolvable decryption problem.
True, and occasionally seen even today in Japan (cat(1) will produce such files easily, and any system for including files).