[I18n-sig] [PATCH] UTF-8 decoding: Fix handling of invalid byte
sequences
M.-A. Lemburg
mal@lemburg.com
Mon, 08 May 2000 10:01:20 +0200
Florian Weimer wrote:
>
> Could you have a look at the following patch? It fixes a rather
> funny scoping problem with the continue statement, which results in
> more deterministic handling of invalid sequences. In addition, the
> treatment of invalid characters in "replace" mode is improved: now,
> an incomplete or otherwise invalid UTF-8 sequence generates exactly
> one replacement character. As a result, the Python UTF-8 decoder now
> passes Markus Kuhn's UTF-8 stress test. (Shall I make a Python test
> out of it?)
>
> If there aren't any objections, I'll forward this patch through the
> official channels (if it's still necessary).
Looks good, except that you should move the nextCharacter:
label right before the closing } of the while loop. Otherwise,
the while() condition won't be checked.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/