[I18n-sig] UTF-8 decoder in CVS still buggy
Sun, 23 Jul 2000 13:21:55 -0700
I'd rather that it not try to "repair" broken UTF-8. If it isn't UTF-8,
throw an exception,
and let the caller decide.
For example, when parsing XML, invalide UTF-8 means the whole document is
It is considered polite to say where the first invalid character occurs,
but it is not
acceptable to continue parsing. An XML parser cannot use a UTF-8 decoder
Code that deals with multiple encodings usually needs to do some encoding
up front, before choosing an encoder. If the guess is wrong, I'd want the
fail, so we can try the next most likely endcoding.
We're busy converting our search engine to use Unicode, so I'm really
the issues right now.
Senior Staff Engineer, Ultraseek Server, Inktomi Corp.