[I18n-sig] Japanese commentary on the Pre-PEP (1 of 4)

Atsuo Ishimoto ishimoto@gembook.org
Wed, 21 Feb 2001 02:35:23 +0900

Brian, Thanks for your effort to translate our comment. 

On Tue, 20 Feb 2001 14:22:43 +0000
Toby Dickenson <mbel44@dial.pipex.net> wrote:

> If those ??? are anything other than ASCII characters, then it doesnt
> work *predictably* today. (assuming the requirement that the file name
> is correct when viewed using the platforms native file browser)
If the filename is illegal for the platform, fopen() may returns error.
Why should we check whether filename is valid or not? Current python
doesn't check if filename contains illegal letters, such as ':' on Win32.
This is because platform knows their job and character set. We don't
have to bother them to work.

> >Well, we could take care when writing our Python scripts only to use strings
> >in such a way that PyArg_ParseTuple() does not cause an error.
> Sticking with the fopen example; I had assumed it is desirable to get
> an error if a script tries to create a file whose name contains
> japanse characters, on a filesystem that does not support that.
You can get an error from platform-depend fopen(). Python or extension
module don't have to check this.

> If this is a legacy extension library then a byte string is all it
> expects. You could call this function as
> sample.simple(u"????????".encode('encoding_expected_by_sample_dot_simple'))
> I agree we need to provide a simpler interface to new extensions.

I don't believe this make people happy, even if interface is simplified.
It is hard work to remember given function is Python script, legacy
extension or Unicode-aware extension. 

> >	/* SJIS??? */
> >#ELSE
> >	/* EUC??? */
> >	
> >	FILE *f = fopen(....)
> >
> >I don't think anyone really wants to write code like this.
> I think those ifdefs could be replaced by one call to PyUnicode_Encode

May be. But to encode, you need to know the possible character set  of
incoming Unicode string and it's encoding, and specify them explicitly.
Platform depended default encoding may eliminate hard coded encoding
name, but I'm afraid of performance penalty for really long strings.

> As a European native-English speaker, I dont think this is true so
> long as we preserve the ASCII default encoding. An application that
> stores latin-1 data in a mix of unicode and plain strings will quickly
> trigger an exception (as soon as a unicode string mixes with a plain
> string containing a non-ASCII byte).
This means a lot of existing extension modules should be updated. It is
hard for me to believe this is good idea.

> A useful counterexample may be Mark Hammond's extensions for
> supporting win32 and com. They have always included explicit support
> for automatic encoding of unicode parameters on platforms where win32
> uses 8-bit strings, and automatic decoding of plain strings when used
> with COM, which is always unicode.

win32com works fine because COM is the Unicode world. But Python should
live in the Unicode hostile land, I believe.

Wishing you can read my English....

Atsuo Ishimoto