On Sat, Apr 10, 2021 at 12:15 AM Paul Bryan <pbryan@anode.ca> wrote:

This sounds more like a Unicode thing than a generic string thing. And, in Uncode, Greek characters are included in multiple groupings. Searching for "Theta" to see what we get:

Greek and Coptic: U+0398 GREEK CAPITAL LETTER THETA U+03B8 GREEK SMALL LETTER THETA U+03D1 GREEK THETA SYMBOL U+03F4 GREEK CAPITAL THETA SYMBOL

Phonetic Extensions Supplement: U+1DBF MODIFIER LETTER SMALL THETA

Mathematical Alphanumeric Symbols: U+1D6AF MATHEMATICAL BOLD CAPITAL THETA U+1D6B9 MATHEMATICAL BOLD CAPITAL THETA SYMBOL U+1D6C9 MATHEMATICAL BOLD SMALL THETA (... 17 more Thetas in this group! ...)

If you were to pick a definitive set of Greek characters for your use case, would it be in the Mathematical Alphanumeric Symbols category? Would others' expected use of Greek characters match yours, or would it need to be inclusive of all Greek characters across groupings?

I'm beginning to sense a metal container containing wriggly things...

But I think you've also nailed the correct solution. Python comes with [1] a unicodedata module, which would be the best way to define these sorts of sets. It's a tad messy to try to gather the correct elements though, so maybe the best way to do this would be a unicodedata.search() function that returns a string of all characters with a particular string in their names, or something like that. ChrisA [1] technically, CPython and many other implementations come with, but there are some (eg uPy) that don't