[Mailman-i18n] "Funny" characters in real names?
Ben Gertzfield
che@debian.org
Tue, 17 Sep 2002 14:22:51 -0700
Barry A. Warsaw wrote:
>To follow up, I believe I have this working now. Here's how it works.
>
Thanks for the excellent explanation and implementation, Barry. I'll=20
test this when it's checked in. Some comments below..
>First, the only change to the MemberAdaptor API is that real names can
>now be Unicode strings as well as 8-bit strings. If they're 8-bit
>then they'll contain only ascii characters.
> =20
>
ASCII is by definition 7-bit, Barry. Did you mean ISO-8859-1 here?
>When a real name is entered into a web form, we'll first attempt to
>convert it to us-ascii. If that succeeds, we know the real name is
>ascii only and we'll store it in the membership database as an 8-bit
>ascii-only-containing string.
> =20
>
Again, I assume you mean ISO-8859-1 instead of ascii here.
>If the conversion fails, we'll convert the real name to Unicode using
>the charset of the context's language (i.e. list preferred if we're
>looking at an admin page, user preferred if we're looking at an
>options page, and form value if we're looking at the subscribe page --
>all with appropriate fallbacks to Something Sensible). We'll also do
>html entity replacement (e.g. #&246; -> =F6). We'll store this Unicode
>string as the member's real name in the membership database, but we
>don't store the charset because...
> =20
>
This is a good thing. Note that some browsers might (I haven't checked=20
this) incorrectly send the entity &246; for whatever character is at=20
position 246 in the user's default character set, not character 246 in=20
Unicode. This might be something to look out for, but I don't know if=20
it's important.
Everything else looks good. The kludge to assume iso-8859-1 on us-ascii=20
pages is unfortunately a generally good one, as that will make the most=20
people happy. I hate to do it, though!
Ben