A UTF-8-specific dtype and maybe a string-specialized `object` dtype address the very real and consequential problems that I face (namely and respectively, working with HDF5 and in-memory manipulation of string datasets).
I'm happy to consider a latin-1-specific dtype as a second, workaround-for-specific-applications-only-you-have-been-warned-you're-gonna-get-mojibake option. It should not be *the* Unicode string dtype (i.e. named np.realstring or np.unicode as in the original proposal).