Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
Duncan Booth
duncan.booth at invalid.invalid
Tue Nov 1 14:54:09 EDT 2011
Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
> MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range(128))
>
> # Untested
> def is_ascii_text(text):
> for c in text:
> n = ord(c)
> if n >= len(MASK) or MASK[n] == '\0': return False
> return True
>
>
> Optimizing it is left as an exercise :)
>
#untested
LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
MASK = [True if chr(n) in LEGAL else False for n in range(128)]
# Untested
def is_ascii_text(text):
try:
return all(MASK[ord(c)] for c in text)
except IndexError:
return False
--
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list
mailing list