[I18n-sig] Japanese commentary on the Pre-PEP (2 of 4)

Tamito Kajiyama kajiyama@pseudo.grad.sccs.chukyo-u.ac.jp
Wed, 21 Feb 2001 14:30:15 +0900 (JST)

Brian, thank you for the great translation! 

Paul Prescod wrote:
| It is certainly too early for Python to abandon the one-byte centric
| view of the world. It is NOT too early to start putting into place a
| transition plan to the future world that we will all be forced to live
| in. Part of that transition is teaching people that literal strings may
| one day allow characters greater than 128 (perhaps directly, perhaps
| through an escape mechanism).

I agree.

| > The present implementation of strings in Python, where a string represents
| > a sequence of bytes, is one feature that makes Python easy for Japanese
| > developers to use.  
| If Japanese programmers understand the difference between a byte and a
| character (which they must!), why would they be opposed to making that
| distinction explicit in code?

They are not opposed to the distinction, I believe.  In fact,
Python 2.0 makes such a distinction since it has the byte string
and Unicode string data types.  The present two distinct data
types are necessary and sufficient, I think.

Guido van Rossum wrote:
| Maybe because, like me, they're thinking in historical terms where
| 'char' is just another word for byte?

Paul Prescod wrote:
| I still assert that the interpretation will not change. If you have no
| encoding declaration then the only rational choice is to treat each byte
| as a character. Therefore the indexes would work exactly as they do
| today.

As Guido pointed out, Japanese programmers are thinking that
'char' in Python (and C) is another word of 'byte'.  Therefore,
to treat each byte as a character is not rational at least in
Japanese text processing.  I'm quite sure that tons of existing
programs will break if the semantics of the byte string and
Unicode string are swapped.


KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>