Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
Duncan Booth
duncan.booth at invalid.invalid
Tue Nov 1 16:47:26 EDT 2011
MRAB <python at mrabarnett.plus.com> wrote:
> On 01/11/2011 18:54, Duncan Booth wrote:
>> Steven D'Aprano<steve+comp.lang.python at pearwood.info> wrote:
>>
>>> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
>>> MASK = ''.join('\01' if chr(n) in LEGAL else '\0' for n in range
(128))
>>>
>>> # Untested
>>> def is_ascii_text(text):
>>> for c in text:
>>> n = ord(c)
>>> if n>= len(MASK) or MASK[n] == '\0': return False
>>> return True
>>>
>>>
>>> Optimizing it is left as an exercise :)
>>>
>>
>> #untested
>> LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
>> MASK = [True if chr(n) in LEGAL else False for n in range(128)]
>>
> Instead of:
>
> True if chr(n) in LEGAL else False
>
> why not:
>
> if chr(n) in LEGAL
>
I think you meant to drop the 'if' also.
MASK = [chr(n) in LEGAL for n in range(128)]
But yes, I was concentrating on the function body rather than the
initialisation.
--
Duncan Booth http://kupuguy.blogspot.com
More information about the Python-list
mailing list