Fastest way to detect a non-ASCII character in a list of strings.
Tim Chase
python.list at tim.thechases.com
Sun Oct 17 22:38:55 EDT 2010
On 10/17/10 19:04, Rhodri James wrote:
> import string
> return set("".join(L))<= set(string.printable)
>
> I've no idea whether this is faster or slower than any of your
> suggestions.
For set("".join(L)) to return, it has to scan the entire input
list/string. Imagine
s = UNPRINTABLE_CHAR + ('a'*1000000)
I'd sooner do something like
printable_set = set(string.printable)
return all((c in printable_set) for c in s)
which will bail as soon as it encounters a character that isn't
printable.
This also somewhat addresses Seebs's concern about defining what
you want -- put "valid" characters in the set. But the various
algorithms you (the OP, Dun) propose don't have the same
functionality for characters < ASCII 32 (space). Your #2 is more
like
ord(c) < 128
instead of "31 < ord(c) < 128"
As a modest speed-up on your (OP's) #3, you can do one
conversion-to-char for your endpoints instead of N
converstions-to-ord:
start = chr(31)
end = chr(127)
return all(start < c < end for c in ...)
> You could "timeit" and see, or you could wait a bit and not
> optimise prematurely.
But this is sage advice regardless of the algorithm :)
-tkc
More information about the Python-list
mailing list