[I18n-sig] UTF-8 decoder in CVS still buggy

23 Jul 2000 22:40:19 +0200

Walter Underwood <wunder@ultraseek.com> writes:

> I'd rather that it not try to "repair" broken UTF-8. If it isn't
> UTF-8, throw an exception,
> and let the caller decide.

This option already exists.  This isn't appropriate for some
applications, though.  Sometimes you just have the data und you have
to make the best out of it, and you can't ask someone to give you a
fixed version.

> We're busy converting our search engine to use Unicode, so I'm
> really familiar with the issues right now.

And your search engine stops processing a document as soon as it
encounters an invalid UTF-8 sequence even though the majority of it is
valid UTF-8?  I don't think so.