New Features in Python 1.6

Lloyd Zusman ljz at asfast.com
Wed Apr 5 00:35:44 CEST 2000


nospam.newton at gmx.li (Philip 'Yes, that's my address' Newton) writes:

> On Tue, 4 Apr 2000 14:30:06 -0400, "Terry Reedy" <tjreedy at udel.edu>
> wrote:
> 
> >> Python strings can now be stored as Unicode strings.  To make it easier
> >> to type Unicode strings, the single-quote character defaults to creating
> >> a Unicode string, while the double-quote character defaults to ASCII
> >> strings.
> >
> >' = 1 byte/char, " = 2 bytes/ char is more straightforward.
> 
> Yes, but this depends on what you mean by "Unicode". You have to
> represent those Unicode characters in bytes, somehow, and how many bytes
> you need depends on the format. Also, full Unicode is 32-bit (not just
> the 16 bits of the Basic Multilingual Plane). UTF-16 is two bytes (but
> can be four for surrogates), UTF-8 is a *variable* number of bytes (1 to
> 6, I believe), etc. It's not always 2 bytes per Unicode character.

Well, then, a good solution would be to preceed each character by the
right combination of single and double quotes to signal how many bytes
it takes up, as summarized in this handy table:

  # of bytes    Preceding character sequence(s)

      1         '
      2         ''  "
      3         '"  "'
      4         ''''  "''  '"'  ''"
      5         '''''  "'''  '"''  ''"'  '''"  ""'  "'"  '""
      6         ''''''  "''''  '"'''  ''"''  '''"'  ''''"
                ""''  "'"'  "''"  '"'"  ''""  '""'  """

This could result in fairly long string representations, but that
problem could be solved by using my new algorithm to compess all of
these UTF-whatever representations as one bit (see my earlier post for
details).  Then, using . (dot) and - (dash) for a zero-bit and a
one-bit, respectively, all UTF-whatever strings could then be easily
represented as either . or - in our Python programs.  For example:

  # old program
  print ""''H"'e'"'''l'l""'o'' ''""w"""o'"r'''''l'"'d"!

  # new program
  print -


-- 
 Lloyd Zusman
 ljz at asfast.com



More information about the Python-list mailing list