
Aug. 23, 2014
8:20 a.m.
Chris Angelico writes:
Not sure why 1251,
All of those codes have repertoires that are Cyrillic supersets, presumably Russian-language content, based on Oleg's top domain.
But it's important to note that this is a method of handling junk. It's not a design intention; this is for a situation where I really want to cope with any byte stream and attempt to display it as text. And if I get something that's neither UTF-8 nor CP-1252, I will display it wrongly, and there's nothing can be done about that.
Of course there is. It just gets more heuristic the more numerous the potential encodings are.