[Python-ideas] Python 3000 TIOBE -3%
Terry Reedy
tjreedy at udel.edu
Fri Feb 17 02:18:07 CET 2012
On 2/16/2012 7:59 AM, Paul Moore wrote:
> Add to this the fact that I *know* I've seen supposed text files with
> mixed encoding content, and no-one has *ever* explained how to handle
> that (it's basically a damaged file,
Before unicode, mixed encodings was the only was to have multi-lingual
digital text (with multiple symbol systems) in one file. I presume such
texts used some sort of language markup like <English>, <Hindi> (or
<Sanskrit>), and <Tibetan>, along with software that understood the
markup. Such files were not broken, just the pre-unicode system of
different codes for each language or nation.
To handle such a file, the program, whatever the language, has to
understand the custom markup, segment the bytes, and handle each segment
appropriately.
Crazy text that switches among unknown encodings without notice is a
possibly unsolvable decryption problem. Such have no guaranteed
algorithms, only heuristics.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list