An attempt at guessing the encoding of a (non-unicode) string

Wed Apr 7 05:20:38 EDT 2004

On Mon, 05 Apr 2004 13:37:34 -0700, rumours say that David Eppstein
<eppstein at ics.uci.edu> might have written:

>BTW, if you're going to implement the single-char version, at least for 
>encodings that translate one byte -> one unicode position (e.g., not 
>utf8), and your texts are large enough, it will be faster to precompute 
>a table of byte frequencies in the text and then compute the score by 
>summing the frequencies of alphabetic bytes.

Thanks for the pointer, David.  However, as it often happens, I came
second (or, probably, n-th :).  Seo Sanghyeon sent a URL that includes a
two-char proposal, and it provides an algorithm in section 4.7.1 that I
find appropriate for this matter:

http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
-- 
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix