Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Duncan Booth duncan.booth at invalid.invalid
Tue Nov 1 16:47:26 EDT 2011


MRAB <python at mrabarnett.plus.com> wrote:

> On 01/11/2011 18:54, Duncan Booth wrote:
>> Steven D'Aprano<steve+comp.lang.python at pearwood.info>  wrote:
>>
>>> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
>>> MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range
(128))
>>>
>>> # Untested
>>> def is_ascii_text(text):
>>>      for c in text:
>>>          n = ord(c)
>>>          if n>= len(MASK) or MASK[n] == '\0': return False
>>>      return True
>>>
>>>
>>> Optimizing it is left as an exercise :)
>>>
>>
>> #untested
>> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
>> MASK = [True if chr(n) in LEGAL else False for n in range(128)]
>>
> Instead of:
> 
>      True if chr(n) in LEGAL else False
> 
> why not:
> 
>      if chr(n) in LEGAL
> 
I think you meant to drop the 'if' also.

MASK = [chr(n) in LEGAL for n in range(128)]

But yes, I was concentrating on the function body rather than the 
initialisation.

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list