Re: [Mailman-Developers] Re: [Mailman-checkins] mailman/misc CJKCodecs-1.0.tar.gz, NONE, 1.1.2.1 .cvsignore, 2.2, 2.2.2.1 Makefile.in, 2.33.2.3, 2.33.2.4 paths.py.in, 2.6, 2.6.2.1 JapaneseCodecs-1.4.9.tar.gz, 2.1, NONE KoreanCodecs-2.0.5.tar.gz, 2.1, NONE
[A discussion about replacing JapaneseCodecs and KoreanCodecs in Mailman 2.1.4 with CJKCodecs] On Mon, 2003-12-29 at 03:26, Tokio Kikuchi wrote:
Sorry again Barry.
We have to keep JapaneseCodecs and KoreanCodecs in the ditribution and install in the pythonlib directory because email package designate japanese and korean as prefix of charsets. I will have to study more on cjkcodecs behavior (looks like japanese part has old bug in earlier distribution of JapaneseCodecs) so please cancel this checkin.
Oh dang. The problem is CODEC_MAP in email/Charset.py, right? Here's a hack for Mailman 2.1.4: -----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis -----korean.py from cjkcodecs import euc-kr, cp949, iso-2022-kr, johab We add these two files to Mailman's pythonlib, and then the imports in Charset.py should work correctly. It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4. It's too late to get this into Python 2.3.3, but if this is acceptable, I can check this in for Python 2.3.4, and cut a new email package tarball for Mailman 2.1.4, forgoing the above hack. -Barry
On Mon, 2003-12-29 at 08:53, Barry Warsaw wrote:
It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
Amend that. If I understand how all this works correctly, then importing cjkcodecs.aliases provides direct mapping for all the charsets. So since we already have "import cjkcodecs.aliases" in Mailman's paths.py, we could just delete euc-jp, iso-2022-jp, shift_jis, euc-kr, iso-2022-kr, ks_c_5601-1987, and johab from CODEC_MAP and be done with it. It looks like we didn't need these aliases in CODEC_MAPS even with the older codec packages, since they define all the aliases as well. -Barry
On Mon, Dec 29, 2003 at 09:10:24AM -0500, Barry Warsaw wrote:
On Mon, 2003-12-29 at 08:53, Barry Warsaw wrote:
It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
Amend that. If I understand how all this works correctly, then importing cjkcodecs.aliases provides direct mapping for all the charsets. So since we already have "import cjkcodecs.aliases" in Mailman's paths.py, we could just delete euc-jp, iso-2022-jp, shift_jis, euc-kr, iso-2022-kr, ks_c_5601-1987, and johab from CODEC_MAP and be done with it.
It looks like we didn't need these aliases in CODEC_MAPS even with the older codec packages, since they define all the aliases as well.
It's true. But except for ChineseCodecs. Hye-Shik
On Mon, 2003-12-29 at 09:44, Hye-Shik Chang wrote:
It looks like we didn't need these aliases in CODEC_MAPS even with the older codec packages, since they define all the aliases as well.
It's true. But except for ChineseCodecs.
Since we didn't have any prefixes except japanese and korean, I don't think we're in any worse shape for ChineseCodecs. Right? -Barry
Will this updated patch work? -Barry
On Mon, Dec 29, 2003 at 08:53:11AM -0500, Barry Warsaw wrote:
[A discussion about replacing JapaneseCodecs and KoreanCodecs in Mailman 2.1.4 with CJKCodecs]
On Mon, 2003-12-29 at 03:26, Tokio Kikuchi wrote:
Sorry again Barry.
We have to keep JapaneseCodecs and KoreanCodecs in the ditribution and install in the pythonlib directory because email package designate japanese and korean as prefix of charsets. I will have to study more on cjkcodecs behavior (looks like japanese part has old bug in earlier distribution of JapaneseCodecs) so please cancel this checkin.
I just got a mail that describes problems on CJKCodecs' iso-2022-jp codec from a Japanese user. I'm investigating it and I plan to release new minor revision that fixes the problems soon. BTW, I think shift-jis and euc-jp codec of CJKCodecs 1.0.2 is stable and backward-compatible enough.
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Here's a hack for Mailman 2.1.4:
-----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
and iso_2022_jp_1
-----korean.py from cjkcodecs import euc-kr, cp949, iso-2022-kr, johab
We add these two files to Mailman's pythonlib, and then the imports in Charset.py should work correctly.
Yup. it will. :)
It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
CJKCodecs already have enough compatibility aliases for consumer programs except that uses 'japanese.' or 'korean.' prefix explicitly. It has compatibility aliases for ChineseCodecs also. Hye-Shik
On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Ah yes, I'd forgotten about that, thanks. I've followed up to that tracker item now.
CJKCodecs already have enough compatibility aliases for consumer programs except that uses 'japanese.' or 'korean.' prefix explicitly. It has compatibility aliases for ChineseCodecs also.
Cool. So if the Charset.py.diff patch in the tracker above looks good to you, I'll commit that as soon as Python's release23-maint branch freeze is lifted. Then I'll cut email 2.5.5 and add that to Mailman 2.1.4. Sound good? -Barry
On Mon, Dec 29, 2003 at 09:57:03AM -0500, Barry Warsaw wrote:
On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Ah yes, I'd forgotten about that, thanks. I've followed up to that tracker item now.
CJKCodecs already have enough compatibility aliases for consumer programs except that uses 'japanese.' or 'korean.' prefix explicitly. It has compatibility aliases for ChineseCodecs also.
Cool. So if the Charset.py.diff patch in the tracker above looks good to you, I'll commit that as soon as Python's release23-maint branch freeze is lifted. Then I'll cut email 2.5.5 and add that to Mailman 2.1.4.
Sound good?
Okay for me. BTW, if no aliases with same key and value is needed, can't a line below the alises removed together? : 'utf-8': 'utf-8', Thanks! Hye-Shik
On Mon, 2003-12-29 at 10:12, Hye-Shik Chang wrote:
Okay for me. BTW, if no aliases with same key and value is needed, can't a line below the alises removed together? :
'utf-8': 'utf-8',
Good catch, thanks! -Barry
On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
I just got a mail that describes problems on CJKCodecs' iso-2022-jp codec from a Japanese user. I'm investigating it and I plan to release new minor revision that fixes the problems soon.
Oh yes, please let me know asap when this is ready. This is the last issue I need to clear up before I release Mailman 2.1.4, which /will/ happen before the end of this year. I'd like for that to be ready tomorrow (Tuesday 30-Dec) if possible. -Barry
On Mon, Dec 29, 2003 at 10:05:14AM -0500, Barry Warsaw wrote:
On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
I just got a mail that describes problems on CJKCodecs' iso-2022-jp codec from a Japanese user. I'm investigating it and I plan to release new minor revision that fixes the problems soon.
Oh yes, please let me know asap when this is ready. This is the last issue I need to clear up before I release Mailman 2.1.4, which /will/ happen before the end of this year. I'd like for that to be ready tomorrow (Tuesday 30-Dec) if possible.
All the problems on iso-2022-jp* codecs are fixed and a release candidate for CJKCodecs 1.0.3 is ready. (anyway :-)) http://people.freebsd.org/~perky/cjkcodecs-1.0.3c1.tar.bz2 I'll release 1.0.3 final in a day or two. Hye-Shik
Hi, All.
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Here's a hack for Mailman 2.1.4:
-----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
This will not do. (Syntax error!) My fault is that I have separately installed both JapaneseCodecs and cjkcodecs in the python site-packages area. Looks like mailman has looked the site-package codecs before mailman/pytholib codecs. Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners? Some Japanese users looks like to prefer JapaneseCodecs than cjkcodecs and some even prefer using one which override special characters like full-width roman numerics. Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4. -- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
-----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
This will not do. (Syntax error!)
I noticed that. ;) Change the dashes to underscores.
My fault is that I have separately installed both JapaneseCodecs and cjkcodecs in the python site-packages area. Looks like mailman has looked the site-package codecs before mailman/pytholib codecs.
Hmm, it shouldn't. Mailman /should/ be set up to look in pythonlib first.
Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners?
Perhaps, but 1) I think Mailman should come with batteries included and be easy to install, 2) I don't want to rely on having to install these packages in the system's site-packages directory because that affects all users of Python on that system.
Some Japanese users looks like to prefer JapaneseCodecs than cjkcodecs and some even prefer using one which override special characters like full-width roman numerics.
Hmm. I have to defer to you on this. In general though, it's a shame there has to be more than one codec package for Japanese. Also, is JapaneseCodecs still being developed?
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4. So we'll stick with the status quo for Mailman 2.1.4. It would really be nice if Python 2.4 included the Asian codecs by default. -Barry
On Mon, 2003-12-29 at 23:00, Barry Warsaw wrote:
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
Besides, my patch for Charset.py breaks Python's test suite. I'm not yet sure what the right way to fix this is. http://sourceforge.net/tracker/index.php?func=detail&aid=852347&group_id=5470&atid=105470 -Barry
I think this patch will fix. @@ -221,6 +220,8 @@ # it. henc, benc, conv = CHARSETS.get(self.input_charset, (SHORTEST, BASE64, None)) + if not conv: + conv = self.input_charset # Set the attributes, allowing the arguments to override the default. self.header_encoding = henc self.body_encoding = benc @@ -230,7 +231,7 @@ self.input_codec = CODEC_MAP.get(self.input_charset, self.input_charset) self.output_codec = CODEC_MAP.get(self.output_charset, - self.input_codec) + self.output_charset) def __str__(self): return self.input_charset.lower() Sorry for folding. Barry Warsaw wrote:
On Mon, 2003-12-29 at 23:00, Barry Warsaw wrote:
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
Besides, my patch for Charset.py breaks Python's test suite. I'm not yet sure what the right way to fix this is.
http://sourceforge.net/tracker/index.php?func=detail&aid=852347&group_id=5470&atid=105470
-Barry
_______________________________________________ Mailman-i18n mailing list Posts: Mailman-i18n@python.org Unsubscribe: http://mail.python.org/mailman/options/mailman-i18n/tkikuchi%40is.kochi-u.ac...
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
On Tue, 2003-12-30 at 07:12, Tokio Kikuchi wrote:
I think this patch will fix.
Works for me, thanks. I've updated the tracker item. -Barry
DONT WANT THE SUSCRIPTION IS BY LAW
On Mon, 2003-12-29 at 23:00, Barry Warsaw wrote:
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
Besides, my patch for Charset.py breaks Python's test suite. I'm not yet sure what the right way to fix this is.
http://sourceforge.net/tracker/index.php?func=detail&aid=852347&group_id=5470&atid=105470
-Barry
_______________________________________________ Mailman-i18n mailing list Posts: Mailman-i18n@python.org Unsubscribe: http://mail.python.org/mailman/options/mailman-i18n/patudelan%40goalsnet.com...
Barry Warsaw wrote Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
In any case, the 2.3 branch is in feature freeze now (has been for quite some time) so it's not likely that this sort of new functionality is acceptable on the 2.3 branch. Anthony (wearing the harsh release manager hat). -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
On Tue, 2003-12-30 at 08:10, Anthony Baxter wrote:
In any case, the 2.3 branch is in feature freeze now (has been for quite some time) so it's not likely that this sort of new functionality is acceptable on the 2.3 branch.
Anthony (wearing the harsh release manager hat).
I know that the branch is current frozen waiting for Jack's thaw once the Mac version of 2.3.3 is finished. So there's no way this will make it into the tree before the end of the year, which is my own self-imposed deadline for Mailman 2.1.4. No matter; I've reverted the change in Mailman so we won't be shipping CJKCodecs. But I do still think this is an appropriate patch for Python 2.3.x, since it really isn't a new feature. This change should be appropriate whether you continue to use the old (and unsupported) Korean and Chinese codecs, with the alternative (and supported) Japanese codec, or whether you decide to use the combined CJKCodecs package. At its heart the patch actually removes unnecessary dependencies on the separate Asian codec packages. Since they all provide aliases, this will make the Charset.py file independent of the codec package being used. As soon as Jack thaws the release23-maint branch, I think this patch should go in. I intend to apply it to the head for 2.4 now that the last regression has been fixed. -Barry
Barry Warsaw wrote But I do still think this is an appropriate patch for Python 2.3.x, since it really isn't a new feature. This change should be appropriate whether you continue to use the old (and unsupported) Korean and Chinese codecs, with the alternative (and supported) Japanese codec, or whether you decide to use the combined CJKCodecs package. At its heart the patch actually removes unnecessary dependencies on the separate Asian codec packages. Since they all provide aliases, this will make the Charset.py file independent of the codec package being used.
I guess the deciding thing (for me) is that code written to use Python 2.3.4 (and the new codec work) should work on Python 2.3.x (x<4). I really don't want to see another repeat of the 2.2.2 fiasco (where code written for 2.2.2 wouldn't work on 2.2.1 or 2.2, because of the new True/False objects). I've seen far, far too much code that's had to do try: True, False except: True = 1 False = 0 Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.
On Tue, 2003-12-30 at 22:58, Anthony Baxter wrote:
Barry Warsaw wrote But I do still think this is an appropriate patch for Python 2.3.x, since it really isn't a new feature. This change should be appropriate whether you continue to use the old (and unsupported) Korean and Chinese codecs, with the alternative (and supported) Japanese codec, or whether you decide to use the combined CJKCodecs package. At its heart the patch actually removes unnecessary dependencies on the separate Asian codec packages. Since they all provide aliases, this will make the Charset.py file independent of the codec package being used.
I guess the deciding thing (for me) is that code written to use Python 2.3.4 (and the new codec work) should work on Python 2.3.x (x<4). I really don't want to see another repeat of the 2.2.2 fiasco (where code written for 2.2.2 wouldn't work on 2.2.1 or 2.2, because of the new True/False objects). I've seen far, far too much code that's had to do
try: True, False except: True = 1 False = 0
Since I don't actually use the codecs, except in the context of Mailman and even then I couldn't tell you what all those pretty graphics mean, I think we have to ultimately defer to the experts. But I don't /think/ its nearly as bad as this. This change is useful even if you are using the older codecs and decide to stick with them. They define the necessary aliases to make this all work, so the dependencies on the japanese and korean package names aren't necessary. -Barry
On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
I've done this now in Mailman's cvs (Release_2_1-maint branch). Please double check. -Barry
Looks OK. I tested some messages without any codecs in site-package. -- Tokio Barry Warsaw wrote:
On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
I've done this now in Mailman's cvs (Release_2_1-maint branch). Please double check.
-Barry
On Tue, 2003-12-30 at 07:15, Tokio Kikuchi wrote:
Looks OK. I tested some messages without any codecs in site-package.
Great, thanks. I'll put together 2.1.4rc1 later today. -Barry
DONT WANT THE SUSCRIPTION IS BY FEDERAL LAW
On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
I've done this now in Mailman's cvs (Release_2_1-maint branch). Please double check.
-Barry
_______________________________________________ Mailman-i18n mailing list Posts: Mailman-i18n@python.org Unsubscribe: http://mail.python.org/mailman/options/mailman-i18n/patudelan%40goalsnet.com...
participants (5)
-
Anthony Baxter
-
Barry Warsaw
-
Hye-Shik Chang
-
Pat
-
Tokio Kikuchi