Unicode Debugging Hints?

Martin v. Löwis loewis at informatik.hu-berlin.de
Tue Oct 8 08:28:39 EDT 2002


holger krekel <pyth at devel.trillke.net> writes:

> does anyone have some small functions to answer
> questions 'might this be latin1' or 'might this be utf8'
> or 'is this definitely not latin1' and such?

Some of these questions can be answered really simple

def maybe_encoding(s, enc):
  try:
    unicode(s, enc)
    return 1
  except UnicodeError:
    return 0

def is_ascii(s): return maybe_encoding(s, 'ascii')

def is_utf_8(s): return not is_ascii(s) and maybe_encoding('utf-8')

def maybe_latin_x(s):
  if is_ascii(s) or is_utf_8(s): return 0
  for c in s:
    if 128 <= ord(c) < 160:
      return 0
  return 1

Telling apart the Latin-x variants is not really possible.

Regards,
Martin



More information about the Python-list mailing list