Unicode Debugging Hints?
Martin v. Löwis
loewis at informatik.hu-berlin.de
Tue Oct 8 08:28:39 EDT 2002
holger krekel <pyth at devel.trillke.net> writes:
> does anyone have some small functions to answer
> questions 'might this be latin1' or 'might this be utf8'
> or 'is this definitely not latin1' and such?
Some of these questions can be answered really simple
def maybe_encoding(s, enc):
try:
unicode(s, enc)
return 1
except UnicodeError:
return 0
def is_ascii(s): return maybe_encoding(s, 'ascii')
def is_utf_8(s): return not is_ascii(s) and maybe_encoding('utf-8')
def maybe_latin_x(s):
if is_ascii(s) or is_utf_8(s): return 0
for c in s:
if 128 <= ord(c) < 160:
return 0
return 1
Telling apart the Latin-x variants is not really possible.
Regards,
Martin
More information about the Python-list
mailing list