[I18n-sig] Re: [Python-Dev] Unicode compromise?

Guido van Rossum guido@python.org
Tue, 02 May 2000 16:47:30 -0400

> I could live with this compromise as long as we document that a future
> version may use the "character is a character" model. I just don't want
> people to start depending on a catchable exception being thrown because
> that would stop us from ever unifying unmarked literal strings and
> Unicode strings.

Agreed (as I've said before).

> --
> Are there any steps we could take to make a future divorce of strings
> and byte arrays easier? What if we added a 
> binary_read()
> function that returns some form of byte array. The byte array type could
> be just like today's string type except that its type object would be
> distinct, it wouldn't have as many string-ish methods and it wouldn't
> have any auto-conversion to Unicode at all.

You can do this now with the array module, although clumsily:

  >>> import array
  >>> f = open("/core", "rb")
  >>> a = array.array('B', [0]) * 1000
  >>> f.readinto(a)

Or if you wanted to read raw Unicode (UTF-16):

  >>> a = array.array('H', [0]) * 1000
  >>> f.readinto(a)
  >>> u = unicode(a, "utf-16")

There are some performance issues, e.g. you have to initialize the
buffer somehow and that seems a bit wasteful.

> People could start to transition code that reads non-ASCII data to the
> new function. We could put big warning labels on read() to state that it
> might not always be able to read data that is not in some small set of
> recognized encodings (probably UTF-8 and UTF-16).
> Or perhaps binary_open(). Or perhaps both.
> I do not suggest just using the text/binary flag on the existing open
> function because we cannot immediately change its behavior without
> breaking code.

A new method makes most sense -- there are definitely situations where
you want to read in text mode for a while and then switch to binary
mode (e.g. HTTP).

I'd like to put this off until after Python 1.6 -- but it deserves

--Guido van Rossum (home page: http://www.python.org/~guido/)