[I18n-sig] Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Wed, 07 Feb 2001 18:40:29 -0800


"Martin v. Loewis" wrote:
> 
> ...
> 
>   public String(byte[] ascii, int hibyte); // in class java.lang.String
> 
> It would use the ascii array, and fill it with hibyte in-between;
> hibyte was typically 0. The documentation now says
>
> # Deprecated. This method does not properly convert bytes into
> # characters. 

That's right. This function could generate invalid Unicode. That's
totally different than what I'm proposing!

> ...
> It just works for the English programmer by coincidence; that
> programmer should really tell apart text and byte strings in source as
> well.

Are you really saying that if you were a writing a Python book you would
say that the appropriate way to write a "Hello World" program is:

print _("Hello World")

Please give some thought to usability! I love Python because it is
syntactically clean and semantically simple. I can show people Python
code and they immediately understand it.

If you are right, then Python is a scripting language that truly has a
simpler syntax for "byte strings" than it does for "character strings".
If that's so then there is something seriously broken in the language
and we need to figure out how to fix it.

 Paul Prescod