Unicode program representation

François Pinard pinard at iro.umontreal.ca
Mon Apr 3 09:15:19 EDT 2000


"Neil Hodgson" <neilh at hare.net.au> écrit:

>    But then, what role do you see the u" form having?

First, beware these things are new to me as they are new to you.  People who
went to the miseries and length of getting these ideas adopted, against
surely many competing directions and trends (I know how heavy these games
could get sometimes), are surely in a better position than me to reply.

Of course, u'' allows for \uHHHH sequences to represent Unicode characters
outside Latin-1.  But if we put this aside, I would guess that u'' is a
bit, for strings, what the `L' suffix is for integers: that is, it forces
a wider internal notation, which then propagate through coercion while
using various string operators.

One could do, in Python, without usual integers (only using long ones),
but usual integers have a shorter in-memory representation, and are much
faster to operate upon.  One could do, in Python, without narrow strings
(please do not call them ASCII strings, as ASCII is strictly 7-bits)
(only using wide strings, calling them Unicode strings is OK :-), but
narrow strings have a shorter in-memory representation, and are slightly
faster to operate upon.

I much like the way Python is offering Unicode.  Just write as you always
did.  At various points you really need Unicode strings in your code, just
use them: everything is going to be fairly automatic, for conversions and
all such things (unless you do not like UTF-8, in which case some doing is
needed to state your preference).  Until `sre' definitely replaces `re',
you'll have to specify which one you want, forcing you as a programmer to a
better tracking of narrow vs wide string types: but this is only temporary,
the goal seems to be that the programmer should not worry about these things.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard






More information about the Python-list mailing list