On 30/06/2020 13:43, Emily Bowman wrote:
I completely agree with this, that UTF-8 has become the One True Encoding(tm), and UCS-2 and UTF-16 are hardly found anywhere outside of the Win32 API. Nearly all basic emoji can't be represented in UCS-2 wchar_t, let alone composite emoji.
You say that as if it's a bad thing :-)
So how to make that C-compatible? Make everything a void* and it just comes back with as many bytes as it gets?
I'd be inclined to something like that. You really don't want people trying to roll their own UTF-8 handling if you can help it. That does imply the C API will need to be pretty comprehensive, though. (If you want nightmares, take a look at the parsing code in Expat. Multiple layers of macros and function tables make it a horror to comprehend.) -- Rhodri James *-* Kynesim Ltd