[Python-Dev] bytes / unicode

Guido van Rossum guido at python.org
Tue Jun 22 18:17:31 CEST 2010


[Just addressing one little issue here; generally I'm just happy that
we're discussing this issue in such detail from so many points of
view.]

On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
>[...] Would urljoin(b_base, b_subdir) => bytes and
> urljoin(u_base, u_subdir) => unicode be acceptable though?  (I think, given
> other options, I'd rather see two separate functions, though.  It seems more
> discoverable and less prone to taking bad input some of the time to have two
> functions that clearly only take one type of data apiece.)

Hm. I'd rather see a single function (it would be "polymorphic" in my
earlier terminology). After all a large number of string method calls
(and some other utility function calls) already look the same
regardless of whether they are handling bytes or text (as long as it's
uniform). If the building blocks are all polymorphic it's easier to
create additional polymorphic functions.

FWIW, there are two problems with polymorphic functions, though they
can be overcome:

(1) Literals.

If you write something like x.split('&') you are implicitly assuming x
is text. I don't see a very clean way to overcome this; you'll have to
implement some kind of type check e.g.

    x.split('&') if isinstance(x, str) else x.split(b'&')

A handy helper function can be written:

  def literal_as(constant, variable):
      if isinstance(variable, str):
          return constant
      else:
          return constant.encode('utf-8')

So now you can write x.split(literal_as('&', x)).

(2) Data sources.

These can be functions that produce new data from non-string data,
e.g. str(<int>), read it from a named file, etc. An example is read()
vs. write(): it's easy to create a (hypothetical) polymorphic stream
object that accepts both f.write('booh') and f.write(b'booh'); but you
need some other hack to make read() return something that matches a
desired return type. I don't have a generic suggestion for a solution;
for streams in particular, the existing distinction between binary and
text streams works, of course, but there are other situations where
this doesn't generalize (I think some XML interfaces have this
awkwardness in their API for converting a tree to a string).

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list