[I18n-sig] UTF-8 decoder in CVS still buggy
16 Jul 2000 16:04:06 +0200
"M.-A. Lemburg" <firstname.lastname@example.org> writes:
> I've checked in a fix which should remedy the problem.
> Could you run the stress test using the fixed
> interpreter ?
Thanks. It's more consistent now, but I still don't like it. The
basic question is whether a bad sequence like "c0 80" shall be
replaced by one or multiple U+FFFD characters. I vote for a single
replacement character because it seems natural, but different people
may have different opinions here. ;-)
> BTW, how much code is the stress test ? Maybe we should add
> some of it to the test suite.
Currently, it isn't automated (I only feed Markus Kuhn's UTF-8 test
through the decoder), and I expect that an automated implementation
would consist of around 100 lines of code. (The test covers just the
most important borderline cases.)