[Python-Dev] Need a way to test for 8-bit-or-unicode-string

M.-A. Lemburg mal@lemburg.com
Fri, 05 Oct 2001 17:54:05 +0200


Guido van Rossum wrote:
> 
> I'm finding more and more the need to test whether an object is a
> string, including both 8-bit and Unicode strings.  Current practice
> seems to be:
> 
>     if type(x) in (str, unicode):
> 
> or (more verbose but more b/w compatible):
> 
>     if type(x) in (types.StringType, types.UnicodeType):
> 
> or (using a variable recently added to types -- I'm not sure this was
> a good idea):
> 
>     if type(x) in types.StringTypes:
> 
> all of which break if type(x) is a *subclass* of str or unicode.  The
> alternative:
> 
>     if isinstance(x, str) or isinstance(x, unicode):
> 
> is apparently too much typing.
> 
> Some alternatives that have been proposed already:
> 
> - Create a common base class of str and unicode, which should be an
>   abstract class.  This is the most OO solution, but I can't think of
>   a good name; abstractstring is too long, AbstractString or String
>   are uncommon naming conventions for built-in types, 'string' would
>   almost work except that it's already the name of a very common
>   module.[*]

+1. This would be nice and the same could be done for sequences, file-like
objects and other common currently interface-defined object categories.

About the naming: how about numberclass, stringclass, sequenceclass, 
fileclass ?!

Then you could write:

if isinstance(obj, stringclass): ...

which looks OK and is not too much typing.

The advantage of this approach is that it can be extended to other
types and classes as well (much like you can currently do with the
Python exceptions).

With the new type logic in place, how hard would it be making
the existing built-in types subclasses of these base types ?
(also: is there a run-time penalty for this ?)
 
> - Make str a subclass of unicode (or vice versa).  This can't be done
>   because subclassing requires implementation inheritance, in
>   particular the instance structure layout must overlap.  Also, this
>   would make it hard to check for either str or unicode.

-0. This would be hard to get right because the two objects use a 
very different struct layout. Could be an option in the long run though.
 
> - Create a new service function, IsString(x) or isString(x) or
>   isstring(x), that's a shortcut for "isinstance(x, str) or
>   isinstance(x, unicode)".  The question them becomes where to put
>   this: as a builtin, in types.py, or somewhere else...

-1. This mechanism can not be extended by e.g. UserStrings. 
 
> Preferences please?
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> [*] For a while I toyed with the idea of calling the abstract base
> class 'string', and hacking import so that sys.modules['string'] is
> the string class.  The abstract base class should then have methods
> that invoke the concrete implementations, so that string.split(s)
> would be the same as s.split().  This would be compatible with
> previous uses of the string module!  string.letters etc. could then be
> class variables.  Unfortunately this broke down when I realized that
> the signature of string.join() is wrong for the string module: the the
> string.join function is string.join(sequence, stringobject) while the
> signature of the string method is join(stringobject, sequence).  So
> much for that idea... :-)  (BTW this shows to me again that the method
> signature is right and the function signature is wrong.  But even my
> time machine isn't powerful enough to fix this.)  (Hm, it could be
> saved by making string.join() accept the arguments in either order.
> Gross. :-)

Indeed. :-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/