Fastest way to detect a non-ASCII character in a list of strings.

Albert Hopkins marduk at letterboxes.org
Mon Oct 18 05:17:35 CEST 2010


On Sun, 2010-10-17 at 14:59 -0500, Dun Peal wrote:
> `all_ascii(L)` is a function that accepts a list of strings L, and
> returns True if all of those strings contain only ASCII chars, False
> otherwise.
> 
> What's the fastest way to implement `all_ascii(L)`?
> 
> My ideas so far are:
> 
> 1. Match against a regexp with a character range: `[ -~]`
> 2. Use s.decode('ascii')
> 3. `return all(31< ord(c) < 127 for s in L for c in s)`
> 
> Any other ideas?  Which one do you think will be fastest?
> 
> Will reply with final benchmarks and implementations if there's any interest.
> 
> Thanks, D

There seems to be some confusion over what is meant by "ASCII".  So I
just assume it means 7-bit character and propose:

all([True if not x else x[0] >=  '\x00' and x[-1] <= '\x7f' for x in
[sorted(set(y)) for y in L]])

That also kinda assumes and empty list and empty strings are considered
true.





More information about the Python-list mailing list