Re: [Mailman-checkins] mailman/misc CJKCodecs-1.0.tar.gz, NONE, 1.1.2.1 .cvsignore, 2.2, 2.2.2.1 Makefile.in, 2.33.2.3, 2.33.2.4 paths.py.in, 2.6, 2.6.2.1 JapaneseCodecs-1.4.9.tar.gz, 2.1, NONE KoreanCodecs-2.0.5.tar.gz, 2.1, NONE

Hi Barry, I made a mistake on this patch. Hye-Shik Chang's cjkcodecs site has moved from sourceforge to http://cjkpython.i18n.org/ He has updated his codecs to 1.0.2, so please download and update the package and change the package description in Makefile.in accordingly. http://download.berlios.de/cjkpython/cjkcodecs-1.0.2.tar.gz Sorry for the inconvenience. bwarsaw@users.sourceforge.net wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Sorry again Barry. We have to keep JapaneseCodecs and KoreanCodecs in the ditribution and install in the pythonlib directory because email package designate japanese and korean as prefix of charsets. I will have to study more on cjkcodecs behavior (looks like japanese part has old bug in earlier distribution of JapaneseCodecs) so please cancel this checkin. Tokio Tokio Kikuchi wrote:
-- Tokio Kikuchi tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

[A discussion about replacing JapaneseCodecs and KoreanCodecs in Mailman 2.1.4 with CJKCodecs]
On Mon, 2003-12-29 at 03:26, Tokio Kikuchi wrote:
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
Here's a hack for Mailman 2.1.4:
-----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
-----korean.py from cjkcodecs import euc-kr, cp949, iso-2022-kr, johab
We add these two files to Mailman's pythonlib, and then the imports in Charset.py should work correctly.
It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
It's too late to get this into Python 2.3.3, but if this is acceptable, I can check this in for Python 2.3.4, and cut a new email package tarball for Mailman 2.1.4, forgoing the above hack.
-Barry

On Mon, 2003-12-29 at 08:53, Barry Warsaw wrote:
Amend that. If I understand how all this works correctly, then importing cjkcodecs.aliases provides direct mapping for all the charsets. So since we already have "import cjkcodecs.aliases" in Mailman's paths.py, we could just delete euc-jp, iso-2022-jp, shift_jis, euc-kr, iso-2022-kr, ks_c_5601-1987, and johab from CODEC_MAP and be done with it.
It looks like we didn't need these aliases in CODEC_MAPS even with the older codec packages, since they define all the aliases as well.
-Barry

On Mon, Dec 29, 2003 at 09:10:24AM -0500, Barry Warsaw wrote:
It's true. But except for ChineseCodecs.
Hye-Shik

On Mon, 2003-12-29 at 09:44, Hye-Shik Chang wrote:
Since we didn't have any prefixes except japanese and korean, I don't think we're in any worse shape for ChineseCodecs. Right?
-Barry

Will this updated patch work?
-Barry

On Mon, Dec 29, 2003 at 08:53:11AM -0500, Barry Warsaw wrote:
I just got a mail that describes problems on CJKCodecs' iso-2022-jp codec from a Japanese user. I'm investigating it and I plan to release new minor revision that fixes the problems soon. BTW, I think shift-jis and euc-jp codec of CJKCodecs 1.0.2 is stable and backward-compatible enough.
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
and iso_2022_jp_1
Yup. it will. :)
CJKCodecs already have enough compatibility aliases for consumer programs except that uses 'japanese.' or 'korean.' prefix explicitly. It has compatibility aliases for ChineseCodecs also.
Hye-Shik

On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Ah yes, I'd forgotten about that, thanks. I've followed up to that tracker item now.
Cool. So if the Charset.py.diff patch in the tracker above looks good to you, I'll commit that as soon as Python's release23-maint branch freeze is lifted. Then I'll cut email 2.5.5 and add that to Mailman 2.1.4.
Sound good?
-Barry

On Mon, Dec 29, 2003 at 09:57:03AM -0500, Barry Warsaw wrote:
Okay for me. BTW, if no aliases with same key and value is needed, can't a line below the alises removed together? :
'utf-8': 'utf-8',
Thanks!
Hye-Shik


On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
Oh yes, please let me know asap when this is ready. This is the last issue I need to clear up before I release Mailman 2.1.4, which /will/ happen before the end of this year. I'd like for that to be ready tomorrow (Tuesday 30-Dec) if possible.
-Barry

On Mon, Dec 29, 2003 at 10:05:14AM -0500, Barry Warsaw wrote:
All the problems on iso-2022-jp* codecs are fixed and a release candidate for CJKCodecs 1.0.3 is ready. (anyway :-))
http://people.freebsd.org/~perky/cjkcodecs-1.0.3c1.tar.bz2
I'll release 1.0.3 final in a day or two.
Hye-Shik

Hi, All.
This will not do. (Syntax error!)
My fault is that I have separately installed both JapaneseCodecs and cjkcodecs in the python site-packages area. Looks like mailman has looked the site-package codecs before mailman/pytholib codecs.
Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners?
Some Japanese users looks like to prefer JapaneseCodecs than cjkcodecs and some even prefer using one which override special characters like full-width roman numerics.
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
I noticed that. ;) Change the dashes to underscores.
Hmm, it shouldn't. Mailman /should/ be set up to look in pythonlib first.
Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners?
Perhaps, but 1) I think Mailman should come with batteries included and be easy to install, 2) I don't want to rely on having to install these packages in the system's site-packages directory because that affects all users of Python on that system.
Hmm. I have to defer to you on this. In general though, it's a shame there has to be more than one codec package for Japanese. Also, is JapaneseCodecs still being developed?
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
So we'll stick with the status quo for Mailman 2.1.4. It would really be nice if Python 2.4 included the Asian codecs by default.
-Barry

On Mon, 2003-12-29 at 23:00, Barry Warsaw wrote:
Besides, my patch for Charset.py breaks Python's test suite. I'm not yet sure what the right way to fix this is.
http://sourceforge.net/tracker/index.php?func=detail&aid=852347&group_id=5470&atid=105470
-Barry

I think this patch will fix. @@ -221,6 +220,8 @@ # it. henc, benc, conv = CHARSETS.get(self.input_charset, (SHORTEST, BASE64, None)) + if not conv: + conv = self.input_charset # Set the attributes, allowing the arguments to override the default. self.header_encoding = henc self.body_encoding = benc @@ -230,7 +231,7 @@ self.input_codec = CODEC_MAP.get(self.input_charset, self.input_charset) self.output_codec = CODEC_MAP.get(self.output_charset, - self.input_codec) + self.output_charset) def __str__(self): return self.input_charset.lower() Sorry for folding. Barry Warsaw wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

On Tue, 2003-12-30 at 07:12, Tokio Kikuchi wrote:
I think this patch will fix.
Works for me, thanks. I've updated the tracker item.
-Barry

In any case, the 2.3 branch is in feature freeze now (has been for quite some time) so it's not likely that this sort of new functionality is acceptable on the 2.3 branch.
Anthony (wearing the harsh release manager hat).
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.

On Tue, 2003-12-30 at 08:10, Anthony Baxter wrote:
I know that the branch is current frozen waiting for Jack's thaw once the Mac version of 2.3.3 is finished. So there's no way this will make it into the tree before the end of the year, which is my own self-imposed deadline for Mailman 2.1.4. No matter; I've reverted the change in Mailman so we won't be shipping CJKCodecs.
But I do still think this is an appropriate patch for Python 2.3.x, since it really isn't a new feature. This change should be appropriate whether you continue to use the old (and unsupported) Korean and Chinese codecs, with the alternative (and supported) Japanese codec, or whether you decide to use the combined CJKCodecs package. At its heart the patch actually removes unnecessary dependencies on the separate Asian codec packages. Since they all provide aliases, this will make the Charset.py file independent of the codec package being used.
As soon as Jack thaws the release23-maint branch, I think this patch should go in. I intend to apply it to the head for 2.4 now that the last regression has been fixed.
-Barry

I guess the deciding thing (for me) is that code written to use Python 2.3.4 (and the new codec work) should work on Python 2.3.x (x<4). I really don't want to see another repeat of the 2.2.2 fiasco (where code written for 2.2.2 wouldn't work on 2.2.1 or 2.2, because of the new True/False objects). I've seen far, far too much code that's had to do
try: True, False except: True = 1 False = 0
Anthony
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.

On Tue, 2003-12-30 at 22:58, Anthony Baxter wrote:
Since I don't actually use the codecs, except in the context of Mailman and even then I couldn't tell you what all those pretty graphics mean, I think we have to ultimately defer to the experts. But I don't /think/ its nearly as bad as this.
This change is useful even if you are using the older codecs and decide to stick with them. They define the necessary aliases to make this all work, so the dependencies on the japanese and korean package names aren't necessary.
-Barry

On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
I've done this now in Mailman's cvs (Release_2_1-maint branch). Please double check.
-Barry

Looks OK. I tested some messages without any codecs in site-package.
-- Tokio
Barry Warsaw wrote:

On Tue, 2003-12-30 at 07:15, Tokio Kikuchi wrote:
Looks OK. I tested some messages without any codecs in site-package.
Great, thanks. I'll put together 2.1.4rc1 later today.
-Barry

On Sun, 2003-12-28 at 21:32, Tokio Kikuchi wrote:
Will do.
Sorry for the inconvenience.
No problem!
-Barry

Sorry again Barry. We have to keep JapaneseCodecs and KoreanCodecs in the ditribution and install in the pythonlib directory because email package designate japanese and korean as prefix of charsets. I will have to study more on cjkcodecs behavior (looks like japanese part has old bug in earlier distribution of JapaneseCodecs) so please cancel this checkin. Tokio Tokio Kikuchi wrote:
-- Tokio Kikuchi tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

[A discussion about replacing JapaneseCodecs and KoreanCodecs in Mailman 2.1.4 with CJKCodecs]
On Mon, 2003-12-29 at 03:26, Tokio Kikuchi wrote:
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
Here's a hack for Mailman 2.1.4:
-----japanese.py from cjkcodecs import euc-jp, iso-2022-jp, shift_jis
-----korean.py from cjkcodecs import euc-kr, cp949, iso-2022-kr, johab
We add these two files to Mailman's pythonlib, and then the imports in Charset.py should work correctly.
It would be nice if cjkcodecs provided backwards compatibility. Otherwise, we probably want to provide some ourselves in email/Charset.py. I'm not sure there's a better way to do this, but attached is a strawman (untested) patch for email 2.5.5/Python 2.3.4.
It's too late to get this into Python 2.3.3, but if this is acceptable, I can check this in for Python 2.3.4, and cut a new email package tarball for Mailman 2.1.4, forgoing the above hack.
-Barry

On Mon, 2003-12-29 at 08:53, Barry Warsaw wrote:
Amend that. If I understand how all this works correctly, then importing cjkcodecs.aliases provides direct mapping for all the charsets. So since we already have "import cjkcodecs.aliases" in Mailman's paths.py, we could just delete euc-jp, iso-2022-jp, shift_jis, euc-kr, iso-2022-kr, ks_c_5601-1987, and johab from CODEC_MAP and be done with it.
It looks like we didn't need these aliases in CODEC_MAPS even with the older codec packages, since they define all the aliases as well.
-Barry

On Mon, Dec 29, 2003 at 09:10:24AM -0500, Barry Warsaw wrote:
It's true. But except for ChineseCodecs.
Hye-Shik

On Mon, 2003-12-29 at 09:44, Hye-Shik Chang wrote:
Since we didn't have any prefixes except japanese and korean, I don't think we're in any worse shape for ChineseCodecs. Right?
-Barry

Will this updated patch work?
-Barry

On Mon, Dec 29, 2003 at 08:53:11AM -0500, Barry Warsaw wrote:
I just got a mail that describes problems on CJKCodecs' iso-2022-jp codec from a Japanese user. I'm investigating it and I plan to release new minor revision that fixes the problems soon. BTW, I think shift-jis and euc-jp codec of CJKCodecs 1.0.2 is stable and backward-compatible enough.
Oh dang.
The problem is CODEC_MAP in email/Charset.py, right?
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
and iso_2022_jp_1
Yup. it will. :)
CJKCodecs already have enough compatibility aliases for consumer programs except that uses 'japanese.' or 'korean.' prefix explicitly. It has compatibility aliases for ChineseCodecs also.
Hye-Shik

On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
There's a bug report by Jason R. Mastaler already: http://www.python.org/sf/852347
Ah yes, I'd forgotten about that, thanks. I've followed up to that tracker item now.
Cool. So if the Charset.py.diff patch in the tracker above looks good to you, I'll commit that as soon as Python's release23-maint branch freeze is lifted. Then I'll cut email 2.5.5 and add that to Mailman 2.1.4.
Sound good?
-Barry

On Mon, Dec 29, 2003 at 09:57:03AM -0500, Barry Warsaw wrote:
Okay for me. BTW, if no aliases with same key and value is needed, can't a line below the alises removed together? :
'utf-8': 'utf-8',
Thanks!
Hye-Shik


On Mon, 2003-12-29 at 09:41, Hye-Shik Chang wrote:
Oh yes, please let me know asap when this is ready. This is the last issue I need to clear up before I release Mailman 2.1.4, which /will/ happen before the end of this year. I'd like for that to be ready tomorrow (Tuesday 30-Dec) if possible.
-Barry

On Mon, Dec 29, 2003 at 10:05:14AM -0500, Barry Warsaw wrote:
All the problems on iso-2022-jp* codecs are fixed and a release candidate for CJKCodecs 1.0.3 is ready. (anyway :-))
http://people.freebsd.org/~perky/cjkcodecs-1.0.3c1.tar.bz2
I'll release 1.0.3 final in a day or two.
Hye-Shik

Hi, All.
This will not do. (Syntax error!)
My fault is that I have separately installed both JapaneseCodecs and cjkcodecs in the python site-packages area. Looks like mailman has looked the site-package codecs before mailman/pytholib codecs.
Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners?
Some Japanese users looks like to prefer JapaneseCodecs than cjkcodecs and some even prefer using one which override special characters like full-width roman numerics.
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
I noticed that. ;) Change the dashes to underscores.
Hmm, it shouldn't. Mailman /should/ be set up to look in pythonlib first.
Since we can get rid of the aliases in Charset.py, woud it not be better to leave the package installation to the indivisual site owners?
Perhaps, but 1) I think Mailman should come with batteries included and be easy to install, 2) I don't want to rely on having to install these packages in the system's site-packages directory because that affects all users of Python on that system.
Hmm. I have to defer to you on this. In general though, it's a shame there has to be more than one codec package for Japanese. Also, is JapaneseCodecs still being developed?
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
Looks like we'll have to. The other problem is that I can't make the necessary changes to the email package until the Python 2.3 branch is freed up and it doesn't look that that will happen in time. I don't want to include an unreleased version of the email package with Mailman 2.1.4.
So we'll stick with the status quo for Mailman 2.1.4. It would really be nice if Python 2.4 included the Asian codecs by default.
-Barry

On Mon, 2003-12-29 at 23:00, Barry Warsaw wrote:
Besides, my patch for Charset.py breaks Python's test suite. I'm not yet sure what the right way to fix this is.
http://sourceforge.net/tracker/index.php?func=detail&aid=852347&group_id=5470&atid=105470
-Barry

I think this patch will fix. @@ -221,6 +220,8 @@ # it. henc, benc, conv = CHARSETS.get(self.input_charset, (SHORTEST, BASE64, None)) + if not conv: + conv = self.input_charset # Set the attributes, allowing the arguments to override the default. self.header_encoding = henc self.body_encoding = benc @@ -230,7 +231,7 @@ self.input_codec = CODEC_MAP.get(self.input_charset, self.input_charset) self.output_codec = CODEC_MAP.get(self.output_charset, - self.input_codec) + self.output_charset) def __str__(self): return self.input_charset.lower() Sorry for folding. Barry Warsaw wrote:
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

On Tue, 2003-12-30 at 07:12, Tokio Kikuchi wrote:
I think this patch will fix.
Works for me, thanks. I've updated the tracker item.
-Barry

In any case, the 2.3 branch is in feature freeze now (has been for quite some time) so it's not likely that this sort of new functionality is acceptable on the 2.3 branch.
Anthony (wearing the harsh release manager hat).
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.

On Tue, 2003-12-30 at 08:10, Anthony Baxter wrote:
I know that the branch is current frozen waiting for Jack's thaw once the Mac version of 2.3.3 is finished. So there's no way this will make it into the tree before the end of the year, which is my own self-imposed deadline for Mailman 2.1.4. No matter; I've reverted the change in Mailman so we won't be shipping CJKCodecs.
But I do still think this is an appropriate patch for Python 2.3.x, since it really isn't a new feature. This change should be appropriate whether you continue to use the old (and unsupported) Korean and Chinese codecs, with the alternative (and supported) Japanese codec, or whether you decide to use the combined CJKCodecs package. At its heart the patch actually removes unnecessary dependencies on the separate Asian codec packages. Since they all provide aliases, this will make the Charset.py file independent of the codec package being used.
As soon as Jack thaws the release23-maint branch, I think this patch should go in. I intend to apply it to the head for 2.4 now that the last regression has been fixed.
-Barry

I guess the deciding thing (for me) is that code written to use Python 2.3.4 (and the new codec work) should work on Python 2.3.x (x<4). I really don't want to see another repeat of the 2.2.2 fiasco (where code written for 2.2.2 wouldn't work on 2.2.1 or 2.2, because of the new True/False objects). I've seen far, far too much code that's had to do
try: True, False except: True = 1 False = 0
Anthony
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.

On Tue, 2003-12-30 at 22:58, Anthony Baxter wrote:
Since I don't actually use the codecs, except in the context of Mailman and even then I couldn't tell you what all those pretty graphics mean, I think we have to ultimately defer to the experts. But I don't /think/ its nearly as bad as this.
This change is useful even if you are using the older codecs and decide to stick with them. They define the necessary aliases to make this all work, so the dependencies on the japanese and korean package names aren't necessary.
-Barry

On Mon, 2003-12-29 at 20:45, Tokio Kikuchi wrote:
Barry, I again suggest cancelling this commit for cjkcodecs altogether in the meantime of releasing 2.1.4.
I've done this now in Mailman's cvs (Release_2_1-maint branch). Please double check.
-Barry

Looks OK. I tested some messages without any codecs in site-package.
-- Tokio
Barry Warsaw wrote:

On Tue, 2003-12-30 at 07:15, Tokio Kikuchi wrote:
Looks OK. I tested some messages without any codecs in site-package.
Great, thanks. I'll put together 2.1.4rc1 later today.
-Barry

On Sun, 2003-12-28 at 21:32, Tokio Kikuchi wrote:
Will do.
Sorry for the inconvenience.
No problem!
-Barry
participants (4)
-
Anthony Baxter
-
Barry Warsaw
-
Hye-Shik Chang
-
Tokio Kikuchi