[Python-Dev] Unicode

Paul Prescod paul@prescod.net
Tue, 16 May 2000 12:58:33 -0500


"Martin v. Loewis" wrote:
> 
> ...
>
> I think the problem you try to see is not real. My guideline for using
> Unicode in Python 1.6 will be that people should be very careful to
> *not* mix byte strings and Unicode strings. 

I think that as soon as we are adding admonishions to documentation that
things "probably don't behave as you expect, so be careful", we have
failed. Sometimes failure is unavaoidable (e.g. floats do not act
rationally -- deal with it). But let's not pretend that failure is
success.

> If you are processing text
> data, obtained from a narrow-string source, you'll always have to make
> an explicit decision what the encoding is.

Are Python literals a "narrow string source"? It seems blatantly clear
to me that the "encoding" of Python literals should be determined at
compile time, not runtime. Byte arrays from a file are different. 

> If you use Unicode text *a lot*, you may find the need to combine them
> with plain byte text in a more convenient way. 

Unfortunately there will be many people with no interesting in Unicode
who will be dealing with it merely because that is the way APIs are
going: XML APIs, Windows APIs, TK, DCOM, SOAP, WebDAV even some X/Unix
APIs. Unicode is the new ASCII.

I want to get a (Unicode) string from an XML document or SOAP request,
compare it to a string literal and never think about Unicode.

> ...
> why does
> 
> >>> [a,b,c] = (1,2,3)
> 
> work, and
> 
> >>> [1,2]+(3,4)
> ...
> 
> does not?

I dunno. If there is no good reason then it is a bug that should be
fixed. The __radd__ operator on lists should iterate over its argument
as a sequence.

As Fredrik points out, though, this situation is not as dangerous as
auto-conversions because

 a) the latter could be loosened later without breaking code

 b) the operation always fails. It never does the wrong thing silently
and it never succeeds for some inputs.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Hardly anything more unwelcome can befall a scientific writer than 
having the foundations of his edifice shaken after the work is 
finished.  I have been placed in this position by a letter from 
Mr. Bertrand Russell..." 
 - Frege, Appendix of Basic Laws of Arithmetic (of Russell's Paradox)