[Mailman-i18n] "Funny" characters in real names?

Ben Gertzfield che@debian.org
Tue, 17 Sep 2002 14:22:51 -0700


Barry A. Warsaw wrote:

>To follow up, I believe I have this working now.  Here's how it works.
>

Thanks for the excellent explanation and implementation, Barry.  I'll=20
test this when it's checked in.  Some comments below..

>First, the only change to the MemberAdaptor API is that real names can
>now be Unicode strings as well as 8-bit strings.  If they're 8-bit
>then they'll contain only ascii characters.
> =20
>

ASCII is by definition 7-bit, Barry.  Did you mean ISO-8859-1 here?

>When a real name is entered into a web form, we'll first attempt to
>convert it to us-ascii.  If that succeeds, we know the real name is
>ascii only and we'll store it in the membership database as an 8-bit
>ascii-only-containing string.
> =20
>

Again, I assume you mean ISO-8859-1 instead of ascii here.

>If the conversion fails, we'll convert the real name to Unicode using
>the charset of the context's language (i.e. list preferred if we're
>looking at an admin page, user preferred if we're looking at an
>options page, and form value if we're looking at the subscribe page --
>all with appropriate fallbacks to Something Sensible).  We'll also do
>html entity replacement (e.g. #&246; -> =F6).  We'll store this Unicode
>string as the member's real name in the membership database, but we
>don't store the charset because...
> =20
>

This is a good thing.  Note that some browsers might (I haven't checked=20
this) incorrectly send the entity &246; for whatever character is at=20
position 246 in the user's default character set, not character 246 in=20
Unicode.  This might be something to look out for, but I don't know if=20
it's important.

Everything else looks good.  The kludge to assume iso-8859-1 on us-ascii=20
pages is unfortunately a generally good one, as that will make the most=20
people happy.  I hate to do it, though!

Ben