I agree. It would be great to get something more than what the simplistic `unicodedata.category(...)` returns; for example, what Unicode group a character falls in.
This sounds more like a Unicode thing than a generic string thing. And, in Uncode, Greek characters are included in multiple groupings. Searching for "Theta" to see what we get:
Greek and Coptic:
U+0398 GREEK CAPITAL LETTER THETA
U+03B8 GREEK SMALL LETTER THETA
U+03D1 GREEK THETA SYMBOL
U+03F4 GREEK CAPITAL THETA SYMBOL
Phonetic Extensions Supplement:
U+1DBF MODIFIER LETTER SMALL THETA
Mathematical Alphanumeric Symbols:
U+1D6AF MATHEMATICAL BOLD CAPITAL THETA
U+1D6B9 MATHEMATICAL BOLD CAPITAL THETA SYMBOL
U+1D6C9 MATHEMATICAL BOLD SMALL THETA
(... 17 more Thetas in this group! ...)
If you were to pick a definitive set of Greek characters for your use case, would it be in the Mathematical Alphanumeric Symbols category? Would others' expected use of Greek characters match yours, or would it need to be inclusive of all Greek characters across groupings?
I'm beginning to sense a metal container containing wriggly things...
But I think you've also nailed the correct solution. Python comes with
[1] a unicodedata module, which would be the best way to define these
sorts of sets. It's a tad messy to try to gather the correct elements
though, so maybe the best way to do this would be a
unicodedata.search() function that returns a string of all characters
with a particular string in their names, or something like that.
ChrisA
[1] technically, CPython and many other implementations come with, but
there are some (eg uPy) that don't
_______________________________________________