[Python-Dev] Encoding detection in the standard library?

Guido van Rossum guido at python.org
Mon Apr 21 19:38:03 CEST 2008

To the contrary, an encoding-guessing module is often needed, and
guessing can be done with a pretty high success rate. Other Unicode
libraries (e.g. ICU) contain guessing modules. I suppose the API could
return two values: the guessed encoding and a confidence indicator.
Note that the locale settings might figure in the guess.

On Mon, Apr 21, 2008 at 10:28 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Christian Heimes schrieb:
> > David Wolever schrieb:
>  >> Is there some sort of text encoding detection module is the standard
>  >> library?
>  >> And, if not, is there any reason not to add one?
>  >
>  > You cannot detect the encoding unless it's explicitly defined through a
>  > header (e.g. the UTF BOM). It's technically impossible. The best you can
>  > do is an educated guess.
>  Exactly, and in light of that, I'm -1 for such a standard module.
>  We've enough issues with modules implementing (apparently) fully
>  specified standards. :)
>  Georg
>  _______________________________________________
>  Python-Dev mailing list
>  Python-Dev at python.org
>  http://mail.python.org/mailman/listinfo/python-dev
>  Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

--Guido van Rossum (home page: http://www.python.org/~guido/)

More information about the Python-Dev mailing list