Fastest way to detect a non-ASCII character in a list of strings.

Sun Oct 17 22:38:55 EDT 2010

On 10/17/10 19:04, Rhodri James wrote:
>     import string
>     return set("".join(L))<= set(string.printable)
>
> I've no idea whether this is faster or slower than any of your
> suggestions.

For set("".join(L)) to return, it has to scan the entire input 
list/string.  Imagine

   s = UNPRINTABLE_CHAR + ('a'*1000000)

I'd sooner do something like

   printable_set = set(string.printable)
   return all((c in printable_set) for c in s)

which will bail as soon as it encounters a character that isn't 
printable.

This also somewhat addresses Seebs's concern about defining what 
you want -- put "valid" characters in the set.  But the various 
algorithms you (the OP, Dun) propose don't have the same 
functionality for characters < ASCII 32 (space).  Your #2 is more 
like

   ord(c) < 128

instead of "31 < ord(c) < 128"

As a modest speed-up on your (OP's) #3, you can do one 
conversion-to-char for your endpoints instead of N 
converstions-to-ord:

   start = chr(31)
   end = chr(127)
   return all(start < c < end for c in ...)

> You could "timeit" and see, or you could wait a bit and not
> optimise prematurely.

But this is sage advice regardless of the algorithm :)

-tkc