Encoding sniffer?

garabik-news-2005-05 at kassiopeia.juls.savba.sk garabik-news-2005-05 at kassiopeia.juls.savba.sk
Thu Jan 5 12:42:58 EST 2006


Andreas Jung <lists at andreas-jung.com> wrote:
> [-- text/plain, encoding quoted-printable, charset: us-ascii, 6 lines --]
> 
> Does anyone know of a Python module that is able to sniff the encoding of 
> text? Please: I know that there is no reliable way to do this but I need 
> something that works for most of the case...so please no discussion about 
> the sense of such a module and approach.
> 

depends on what exactly you need
one approach is pyenca

the other is:

def try_encoding(s, encodings):
    "try to guess the encoding of string s, testing encodings given in second parameter"

    for enc in encodings:
        try:
            test = unicode(s, enc)
            return enc
        except UnicodeDecodeError:
            pass

    return None

print try_encodings(text, ['ascii', 'utf-8', 'iso8859_1', 'cp1252', 'macroman']


depending on what language and encodings you expects the text to be in,
the first or second approach is better


-- 
 -----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list