[Python-ideas] Python 3000 TIOBE -3%

Terry Reedy tjreedy at udel.edu
Fri Feb 17 02:18:07 CET 2012


On 2/16/2012 7:59 AM, Paul Moore wrote:

> Add to this the fact that I *know* I've seen supposed text files with
> mixed encoding content, and no-one has *ever* explained how to handle
> that (it's basically a damaged file,

Before unicode, mixed encodings was the only was to have multi-lingual 
digital text (with multiple symbol systems) in one file. I presume such 
texts used some sort of language markup like <English>, <Hindi> (or 
<Sanskrit>), and <Tibetan>, along with software that understood the 
markup. Such files were not broken, just the pre-unicode system of 
different codes for each language or nation.

To handle such a file, the program, whatever the language, has to 
understand the custom markup, segment the bytes, and handle each segment 
appropriately.

Crazy text that switches among unknown encodings without notice is a 
possibly unsolvable decryption problem. Such have no guaranteed 
algorithms, only heuristics.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list