Barry A. Warsaw wrote:
To follow up, I believe I have this working now. Here's how it works.
Thanks for the excellent explanation and implementation, Barry. I'll test this when it's checked in. Some comments below..
First, the only change to the MemberAdaptor API is that real names can now be Unicode strings as well as 8-bit strings. If they're 8-bit then they'll contain only ascii characters.
ASCII is by definition 7-bit, Barry. Did you mean ISO-8859-1 here?
When a real name is entered into a web form, we'll first attempt to convert it to us-ascii. If that succeeds, we know the real name is ascii only and we'll store it in the membership database as an 8-bit ascii-only-containing string.
Again, I assume you mean ISO-8859-1 instead of ascii here.
If the conversion fails, we'll convert the real name to Unicode using the charset of the context's language (i.e. list preferred if we're looking at an admin page, user preferred if we're looking at an options page, and form value if we're looking at the subscribe page -- all with appropriate fallbacks to Something Sensible). We'll also do html entity replacement (e.g. #&246; -> รถ). We'll store this Unicode string as the member's real name in the membership database, but we don't store the charset because...
This is a good thing. Note that some browsers might (I haven't checked this) incorrectly send the entity &246; for whatever character is at position 246 in the user's default character set, not character 246 in Unicode. This might be something to look out for, but I don't know if it's important. Everything else looks good. The kludge to assume iso-8859-1 on us-ascii pages is unfortunately a generally good one, as that will make the most people happy. I hate to do it, though! Ben