[Python-Dev] reflections on basestring -- and other abstractbasetypes

Alex Martelli aleaxit at yahoo.com
Mon Nov 3 11:02:54 EST 2003


On Monday 03 November 2003 02:55 pm, Michael Chermside wrote:
> Alex muses on basestring:
> > 2. If we do want to encourage such typetest idioms, it might be a good
> > idea to provide some other such abstract basetypes for the purpose.
>
>        [...]
>
> >   If there was an abstract basetype, say "baseinteger", from which int
> > and long derived,
>
> Great idea... I think there should be single type from which all built-in
> integer-like types inherit, and which user-designed types can inherit
> if they want to behave like integers. I think that type should be called
> "int". Once the int/long distinction is completely gone, this will be

Unfortunately, unless int is made an abstract type, that doesn't help at
all to "type-flag" user-coded types (be they C-coded or Python-coded):
they want to tell "whoever it may concern" that they're intended to be
usable as integers, but not uselessly carry around an instance of int for
the purpose (and need to contort their own layout, if C-coded, for that).

Abstract basetypes such as basestring are useful only to "flag" types as
(intending to conform to) some concept: they don't carry implementation.
Specifically, basestring has no other use except supporting isinstance
(and, I guess, issubclass in some cases:-).  Concrete types such as int
carry more baggage (and provide more uses).

I'm not sure whether it makes sense to have basestring in Python, but
I assume it must -- it's a recent addition, not "legacy", so why would it
have been accepted if it made no sense?  So, a user-coded type can
flag itself as intending to be stringlike, if it wishes, without carrying any
baggage due to that.  Why is intlike so drastically different?

> quite clean, the only confusion now is that the int/long distinction isn't
> yet completely hidden.
>
> > 4. Furthermore, providing "basenumber" would let user-coded classes
> > "flag" in a simple and direct way "I'm emulating numbers".
>
> Okay, that sounds like it might be useful, at least to those people who
> work with wierd varieties of numbers. But I can't think how. Normally,

By allowing a simple test for "is X supposed to be a number", just like
isinstance(X, basestring) allows an equally simple test for "is X supposed
to be a string".  For example, such tests as imaplib.py's
    isinstance(date_time, (int, float))
(I'm not sure why long is omitted here) would simplify to
    isinstance(date_time, basenumber)

There aren't many such checks in the standard library, because overall
it doesn't do much with numbers (while it does work a lot with strings).

But, the categories of use cases aren't very different: either one is
asserting that X is-a [something], a la "assert isinstance(X,...", or one
is checking whether X is-a [something] (i.e. X is allowed to be either
a "something", or not, and there is different behavior in either case).

> I figure that if you overload addition, multiplication, subtraction, and
> perhaps a few other such operators, then you're trying to emulate numbers
> (that or you're abusing operator overloading, and I have no real sympathy

All these operators are defined, in various branch of maths, for things
that are very different from "a number".  Surely you're not claiming that
Numeric is "abusing operator overloading" by allowing users to code
a+b, a*b, a-b etc where a and b are multi-dimensional arrays?  The
ability to use such notation, which is fully natural in the application areas
those users come from, is important to many users.

> for you). What use cases do you have for "basenumber" (I don't mean
> examples of classes that would inherit from basenumber, I mean examples
> where that inheritance would make a difference)?

Let me offer just a couple of use cases, one per kind.  For example,

def __mul__(self, other):
    if isinstance(other, self.KnownNumberTypes):
        return self.__class__([ x*other for x in self.items ])
    else:
        # etc etc, various other multiplication cases

right now, that (class, actually) attribute KnownNumberTypes starts out
"knowing" about int, long, float, gmpy.mpz, etc, and may require user
customization (e.g by subclassing) if any other "kind of (scalar) number"
needs to be supported; besides, the isinstance check must walk linearly
down the tuple of known number types each time.  (I originally had
quite a different test structure:
    try: other + 0
    except TypeError:  # other is not a number
        # various other multiplication cases
    else:
        # other is a number, so...
        return self.__class__([ x*other for x in self.items ])
but the performance for typical benchmarks improved with the isinstance
test, so, reluctantly, that's what I changed to).  If an abstract basetype
'basenumber' caught many useful cases, I'd put it right at the start of
the KnownNumberTypes tuple, omit all subclasses thereof from it, get
better performance, AND be able to document very simply what the user
must do to ensure his own custom type is known to me as "a number".

That's a case where I need to accept both numbers and non-numbers
and do different things.  As for "checking it's a number" I find it quite
OK to do it by trying X+0 and letting the exception, if any, propagate --
just as "checking if it's a string" could proceed by doing X+''.  But maybe
I'm just old-fashioned in this acceptance -- particularly if one thinks of
C-coded extensions, checking for a basetype might be far handier.  E.g.,
in  Python/bltinmodule.c , function builtin_sum uses C-coded typechecking
to single out strings as an error case:

		/* reject string values for 'start' parameter */
		if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
			PyErr_SetString(PyExc_TypeError,
				"sum() can't sum strings [use ''.join(seq) instea

[etc].  Now, what builtin_sum really "wants" to do is to accept numbers,
only -- it's _documented_ as being meant for "numbers": it uses +, NOT
+=, so its performance on sequences, matrix and array-ish things, etc, 
is not going to be good.  But -- it can't easily _test_ whether something 
"is a number".  If we had a PyBaseNumber_Type to use here, it would
be smooth, easy, and fast to check for it.


> >  IF a user class could flag itself as "numeroid" by inheriting
> > basenumber, THEN the "accidental commutativity" COULD be easily removed
> > at least for such classes.
>
> Okay, that's one use case. Any others? 'cause I'm coming up blank.

I see a few other cases in the standard library which want to treat "numbers"
in some specific way different from other types (often forgetting longs:-), 
e.g. Lib/plat-mac/plistlib.py has one.  In gmpy, I would often like some 
operations to be able to accept "a number", perhaps by letting it try to 
transform itself into a float as a worst case (so complex numbers would fail 
there), but I definitely do NOT want to accept non-number objects which 
"happen to be able to return a value from float(x)", such as strings.  In all
such cases of wanting to check if something "is a number", an abstract
basetype might be handy, smooth, fast.


> > ...does anybody see any problem if, in 2.4, we take away the ability to
> > multiply inherit from basestring AND also from another builtin type which
> > does not in turn inherit from basestring...?
>
> I do! I personally wouldn't try to create the class "perlnum" which
> inherits from basestring and also basenumber and which tries to magicaly
> know which is desired and convert back and forth on demand. But I'm
> sure *someone* out there is just dying to write such a class. Why
> prevent them? Not that I'd every USE such a monstrocity, but just don't
> see the ADVANTAGE in providing the programmer with a straightjacket by
> typechecking them (at the language level) to prevent uses outside of
> those envisioned by the language implementers. It sounds decidedly
> non-pythonic to me.

How would it be different from saying that if something is a mapping it
cannot also be a sequence (and vice versa) and trying to distinguish between
the two cases (and, currently, failing for user-coded types because there IS
no way to reliably flag them one way or another)?  The purpose of the
hypothetical abstract basetypes is to let the user optionally flag types in
an unambiguous way.  Types that aren't flagged would presumably keep
muddling through like today, for backwards compatibility.  But allowing the
use of multiple basetypes only seems mean to introduce ambiguity again
and it seems to me that it would have no added value, while providing (at
least) a warning for it would help prevent user mistakes.


Alex




More information about the Python-Dev mailing list