[Python-Dev] Encoding detection in the standard library?

Tony Nelson tonynelson at georgeanelson.com
Mon Apr 21 20:34:50 CEST 2008


At 1:14 PM -0400 4/21/08, David Wolever wrote:
>On 21-Apr-08, at 12:44 PM, skip at pobox.com wrote:
>>
>>     David> Is there some sort of text encoding detection module is the
>>     David> standard library?  And, if not, is there any reason not
>> to add
>>     David> one?
>> No, there's not.  I suspect the fact that you can't correctly
>> determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>Sorry, I wasn't very clear what I was asking.
>
>I was thinking about making an educated guess -- just like chardet
>(http://chardet.feedparser.org/).
>
>This is useful when you get a hunk of data which _should_ be some
>sort of intelligible text from the Big Scary Internet (say, a posted
>web form or email message), and you want to do something useful with
>it (say, search the content).

Feedparser.org's chardet can't guess 'latin1', so it should be used as a
last resort, just as the docs say.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>


More information about the Python-Dev mailing list