[I18n-sig] [PATCH] UTF-8 decoding: Fix handling of invalid byte sequences

M.-A. Lemburg mal@lemburg.com
Mon, 08 May 2000 10:01:20 +0200

Florian Weimer wrote:
> Could you have a look at the following patch?  It fixes a rather
> funny scoping problem with the continue statement, which results in
> more deterministic handling of invalid sequences.  In addition, the
> treatment of invalid characters in "replace" mode is improved: now,
> an incomplete or otherwise invalid UTF-8 sequence generates exactly
> one replacement character.  As a result, the Python UTF-8 decoder now
> passes Markus Kuhn's UTF-8 stress test.  (Shall I make a Python test
> out of it?)
> If there aren't any objections, I'll forward this patch through the
> official channels (if it's still necessary).

Looks good, except that you should move the nextCharacter:
label right before the closing } of the while loop. Otherwise,
the while() condition won't be checked.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/