[Python-Dev] Need a way to test for 8-bit-or-unicode-string

Guido van Rossum guido@python.org
Fri, 05 Oct 2001 10:36:25 -0400


I'm finding more and more the need to test whether an object is a
string, including both 8-bit and Unicode strings.  Current practice
seems to be:

    if type(x) in (str, unicode):

or (more verbose but more b/w compatible):

    if type(x) in (types.StringType, types.UnicodeType):

or (using a variable recently added to types -- I'm not sure this was
a good idea):

    if type(x) in types.StringTypes:

all of which break if type(x) is a *subclass* of str or unicode.  The
alternative:

    if isinstance(x, str) or isinstance(x, unicode):

is apparently too much typing.

Some alternatives that have been proposed already:

- Create a common base class of str and unicode, which should be an
  abstract class.  This is the most OO solution, but I can't think of
  a good name; abstractstring is too long, AbstractString or String
  are uncommon naming conventions for built-in types, 'string' would
  almost work except that it's already the name of a very common
  module.[*]

- Make str a subclass of unicode (or vice versa).  This can't be done
  because subclassing requires implementation inheritance, in
  particular the instance structure layout must overlap.  Also, this
  would make it hard to check for either str or unicode.

- Create a new service function, IsString(x) or isString(x) or
  isstring(x), that's a shortcut for "isinstance(x, str) or
  isinstance(x, unicode)".  The question them becomes where to put
  this: as a builtin, in types.py, or somewhere else...

Preferences please?

--Guido van Rossum (home page: http://www.python.org/~guido/)

[*] For a while I toyed with the idea of calling the abstract base
class 'string', and hacking import so that sys.modules['string'] is
the string class.  The abstract base class should then have methods
that invoke the concrete implementations, so that string.split(s)
would be the same as s.split().  This would be compatible with
previous uses of the string module!  string.letters etc. could then be
class variables.  Unfortunately this broke down when I realized that
the signature of string.join() is wrong for the string module: the the
string.join function is string.join(sequence, stringobject) while the
signature of the string method is join(stringobject, sequence).  So
much for that idea... :-)  (BTW this shows to me again that the method
signature is right and the function signature is wrong.  But even my
time machine isn't powerful enough to fix this.)  (Hm, it could be
saved by making string.join() accept the arguments in either order.
Gross. :-)