
This just in from pypy-dev. I am reposting it here because I am fairly certain that nobody on the pypy-dev mailing list uses the multibytecodex, but there has got to be at least one person here who does. Please reply to the pypy-dev article, not here, or mail to pypy-dev@python.org if you are not on the pypy-dev mailing list (but have delivery turned off as many of you do.) Thank you, Laura ------- Forwarded Message From: Armin Rigo <arigo@tunes.org> Date: Wed, 25 May 2011 21:39:35 +0200 To: pypy-dev@python.org Subject: [pypy-dev] multibytecodec: missing features Hi all, Here are the missing features in multibytecodec: * support for ``errors !=3D "strict"''. * classes MultibyteIncrementalEncoder, MultibyteIncrementalDecoder, MultibyteStreamReader and MultibyteStreamWriter. One reason I didn't implement the classes yet is that I couldn't understand two points in how they are supposed to work. But it seems that there are really two bugs, as I've been pointed to: http://bugs.python.org/issue12100 and http://bugs.python.org/issue12171 . So the question is if we should be bug-compatible with Python 2.7 or if we should instead implement some fixed version. I suppose I'm rather for the fixed version, but I'd like to hear some feedback from people that actually use multibytecodecs. Also, I wouldn't mind if someone would pick up the work and just do it, either the classes or ``errors !=3D "strict"'' :-) A bient=F4t, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev ------- End of Forwarded Message

Le mercredi 25 mai 2011 à 23:41 +0200, Laura Creighton a écrit :
I fixed #12100 in Python 2.7, 3.1, 3.2, 3.3 yesterday. I plan also to fix #12171 in these four versions, it should be done next days.
I suppose I'm rather for the fixed version, but I'd like to hear some feedback from people that actually use multibytecodecs.
Both bugs are related to encoders. I don't think that anyone is using Python CJK codecs to encode text (because nobody noticed these bugs before), but more likely to decode text. Anyway, you should implement a codec without these *bugs*. For your information, I added more tests to the CJK codecs (e.g. see #12057), and I plan to add more tests next weeks. I plan also to fix issue #12016, yet another CJK codec bug. You may want to wait until all of these bugs are fixed before working on your own implementation, or implement directly a version without all of these bugs, and then upgrade the test suite.
Also, I wouldn't mind if someone would pick up the work and just do it, either the classes or ``errors !=3D "strict"'' :-)
The support of error handlers different than strict is far from being perfect. Issue #12016 is the main problem, but there are other minor issues. In some cases, invalid byte sequences are ignored even with the replace error handler (whereas I expected U+FFFD characters). CJK codecs are special because they use escape sequences (especially the ISO 2022 family): what should be done if a byte sequence looks like an escape sequences, but it is not valid? Replace each byte by U+FFFD, or ignore these bytes? I'm trying to write tests "describing" the current behaviour, and then I will maybe try to improve how invalid byte sequences are handled. Victor
participants (2)
-
Laura Creighton
-
Victor Stinner