[I18n-sig] UTF-8 decoder in CVS still buggy
M.-A. Lemburg
mal@lemburg.com
Sat, 02 Sep 2000 16:03:46 +0200
François Pinard wrote:
>
> [mal@lemburg.com]
>
> > Please keep us informed of any quirks you may experience during this
> > conversion. We can use some real life reports for the new Unicode
> > support in Python to polish up the implementation and design.
>
> Hi, people. I just recently subscribed to i18n-sig, and started to
> read the archives. Let me hope you will tolerate that I jump in some
> conversations without having matured all the background.
>
> On the above topic, I did not check what Python exactly does, but I wanted to
> share that my `recode' program is not perfect in that area. In particular,
> there is a requirement for UTF-8 to be valid that the sequence be minimal,
> which `recode' currently does not check on input. Roughly said, an UTF-8
> sequence is not valid if it could have been expressed in fewer bytes.
>
> I've nothing against Python beating me at it! :-)
Could you give some examples ? I'm not sure I understand what you
mean by "could have been expressed with fewer bytes" -- perhaps
a multi-byte encoding where the top-most bytes are 0 ?
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/