Aug. 1, 2013
2:57 a.m.
On Wed, Jul 31, 2013 at 10:11 PM, Steven D'Aprano <steve@pearwood.info>wrote:
Still can't be done reliably, but even if it could, what's so special about ASCII?
Lots of things are special about ASCII. It is a 7-bit subset of pretty much every modern encoding scheme. Being 7-bit, it can be fairly reliably distinguished from most binary formats. Same is true about UTF-8. It is very unlikely that a binary dump of a double array make a valid UTF-8 text and vice versa - UTF-8 text interpreted as a list of doubles is unlikely to produce numbers that are in a reasonable range. I would not mind seeing an "istext()" function somewhere in the stdlib that would only recognize ASCII and UTF-8 as text.