Fastest way to detect a non-ASCII character in a list of strings.

Carl Banks pavlovevidence at gmail.com
Mon Oct 18 04:23:09 CEST 2010


On Oct 17, 12:59 pm, Dun Peal <dunpea... at gmail.com> wrote:
> `all_ascii(L)` is a function that accepts a list of strings L, and
> returns True if all of those strings contain only ASCII chars, False
> otherwise.
>
> What's the fastest way to implement `all_ascii(L)`?
>
> My ideas so far are:
>
> 1. Match against a regexp with a character range: `[ -~]`
> 2. Use s.decode('ascii')
> 3. `return all(31< ord(c) < 127 for s in L for c in s)`
>
> Any other ideas?  Which one do you think will be fastest?

If you do numpy the fastest way might be something like:

ns = np.ndarray(len(s),np.uint8,s)
return np.all(np.logical_and(ns>=32,ns<=127))


Carl Banks



More information about the Python-list mailing list