Detect string has non-ASCII chars without checking each char?

John Machin sjmachin at lexicon.net
Sun Aug 22 06:57:04 EDT 2010


On Aug 22, 5:07 pm, "Michel Claveau -
MVP"<enleverLesX_XX... at XmclavXeauX.com.invalid> wrote:
> Hi!
>
> Another way :
>
>   # -*- coding: utf-8 -*-
>
>   import unicodedata
>
>   def test_ascii(struni):
>       strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
>       if len(struni)==len(strasc):
>          return True
>       else:
>          return False
>
>   print test_ascii(u"abcde")
>   print test_ascii(u"abcdê")

-1

Try your code with u"abcd\xa1" ... it says it's ASCII.

Suggestions:
   test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s)
or
   test_ascii = lambda s: all(c < u'\x80' for c in s)
or
   use try/except

Also:
    if a == b:
        return True
    else:
        return False
is a horribly bloated way of writing
    return a == b





More information about the Python-list mailing list