[I18n-sig] UTF-8 decoder in CVS still buggy

16 Jul 2000 16:04:06 +0200

"M.-A. Lemburg" <mal@lemburg.com> writes:

> I've checked in a fix which should remedy the problem.
> Could you run the stress test using the fixed
> interpreter ?

Thanks.  It's more consistent now, but I still don't like it. The
basic question is whether a bad sequence like "c0 80" shall be
replaced by one or multiple U+FFFD characters. I vote for a single
replacement character because it seems natural, but different people
may have different opinions here. ;-)

> BTW, how much code is the stress test ? Maybe we should add
> some of it to the test suite.

Currently, it isn't automated (I only feed Markus Kuhn's UTF-8 test
through the decoder), and I expect that an automated implementation
would consist of around 100 lines of code.  (The test covers just the
most important borderline cases.)