[I18n-sig] UTF-8 decoder in CVS still buggy
François Pinard
pinard@iro.umontreal.ca
02 Sep 2000 09:34:51 -0400
[mal@lemburg.com]
> Please keep us informed of any quirks you may experience during this
> conversion. We can use some real life reports for the new Unicode
> support in Python to polish up the implementation and design.
Hi, people. I just recently subscribed to i18n-sig, and started to
read the archives. Let me hope you will tolerate that I jump in some
conversations without having matured all the background.
On the above topic, I did not check what Python exactly does, but I wanted to
share that my `recode' program is not perfect in that area. In particular,
there is a requirement for UTF-8 to be valid that the sequence be minimal,
which `recode' currently does not check on input. Roughly said, an UTF-8
sequence is not valid if it could have been expressed in fewer bytes.
I've nothing against Python beating me at it! :-)
--
François Pinard http://www.iro.umontreal.ca/~pinard