On Sat, 3 Jun 2023 at 10:12, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Let's call the styles a tie. Using the SOWPODS scrabble wordlist (no currency symbols, so False answer):
unicode_currency = {chr(c) for c in range(0xFFFF) if unicodedata.category(chr(c)) == "Sc"} wordlist = open('/usr/local/share/sowpods').read() len(wordlist) 2707021 %timeit any(unicodedata.category(ch) == "Sc" for ch in wordlist) 176 ms ± 1.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit any(unicodedata.category(ch) == "Sc" for ch in set(wordlist)) 17.8 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) bool(set(wordlist) & unicode_currency) False %timeit bool(set(wordlist) & unicode_currency) 18 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Of course, this is a small character set of 26 lowercase letters (and newline as I did it). A more diverse alphabet might tip the timing slightly, but it's going to be a small matter either way.
Remember though, the original request was not for a set, but for a string. Try your timing again when working with a string. The any() form is almost certainly the most effective, although I suppose it could be implemented in C for better performance (avoiding calling back into Python repeatedly). Not sure it's necessary though. ChrisA