Encoding sniffer?
garabik-news-2005-05 at kassiopeia.juls.savba.sk
garabik-news-2005-05 at kassiopeia.juls.savba.sk
Thu Jan 5 12:42:58 EST 2006
Andreas Jung <lists at andreas-jung.com> wrote:
> [-- text/plain, encoding quoted-printable, charset: us-ascii, 6 lines --]
>
> Does anyone know of a Python module that is able to sniff the encoding of
> text? Please: I know that there is no reliable way to do this but I need
> something that works for most of the case...so please no discussion about
> the sense of such a module and approach.
>
depends on what exactly you need
one approach is pyenca
the other is:
def try_encoding(s, encodings):
"try to guess the encoding of string s, testing encodings given in second parameter"
for enc in encodings:
try:
test = unicode(s, enc)
return enc
except UnicodeDecodeError:
pass
return None
print try_encodings(text, ['ascii', 'utf-8', 'iso8859_1', 'cp1252', 'macroman']
depending on what language and encodings you expects the text to be in,
the first or second approach is better
--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
More information about the Python-list
mailing list