[Python-Dev] basenumber redux

Tue Jan 17 03:18:10 CET 2006

On Jan 16, 2006, at 2:01 PM, Martin v. Löwis wrote:

> Alex Martelli wrote:
>> I can't find a PEP describing this restriction of basestring, and I
>> don't see why a coder who needs to implement another kind of
>> character string shouldn't subclass basestring, so that those
>> instances pass an isinstance test on basestring which is quite likely
>> to be found e.g. in the standard library.
>
> People could do that, but they would be on their own. basestring
> could be interpreted as "immutable sequence of characterish things",
> but it was really meant to be "union of str and unicode". There
> is no PEP because it was introduced by BDFL pronouncement.

Unfortunately the lack of a PEP leaves this a bit underdocumented.

> That it is not a generic base class for stringish types can be
> seen by looking at UserString.UserString, which doesn't inherit
> from basestring.

Raymond Hettinger's reason for not wanting to add basestring as a  
base for UserString was: UserString is a legacy class, since today  
people can inherit from str directly, and it cannot be changed from a  
classic class to a new-style one without breaking backwards  
compatibility, which for a legacy class would be a big booboo.   
Nothing was said about "different design intent for basestring", as I  
recall (that discussion comes up among the few hits for [basestring  
site:python.org] if you want to check the details).

> For most practical purposes, those two definitions actually
> define the same thing - there simply aren't any stringish data
> types in praxis: you always have Unicode, and if you don't,
> you have bytes.

But not necessarily in one big blob that's consecutive (==compact) in  
memory.  mmap instances are "almost" strings and could easily be made  
into a closer match, at least for the immutable variants, for  
example; other implementations such as SGI STL's "Rope" also come to  
mind.

In the context of a current struggle (a different and long story)  
between Python builds with 2-bytes Unicode and ones with 4-bytes  
Unicode, I've sometimes found myself dreaming of a string type that's  
GUARANTEED to be 2-bytes, say, and against which extension modules  
could be written that don't need recompilation to move among such  
different builds, for example.  It's not (yet) hurting enough to make  
me hunker down and write such an extension (presumably mostly by copy- 
past-edit from Python's own sources;-), but if somebody did it would  
sure be nice if they could have that type "assert it's a string" by  
inheriting from basestring, no?

>> Implementing different kinds of numbers is more likely than
>> implementing different kinds of strings, of course.
>
> Right. That's why a PEP is needed here, but not for basestring.

OK, I've mailed requesting a number.

>
>> A third argument against it is asymmetry: why should I use completely
>> different approaches to check if x is "some kind of string", vs
>> checking if x is "some kind of number"?
>
> I guess that's for practicality which beats purity. People often
> support interfaces that either accept both an individual string
> and a list of strings, and they need the test in that case.
> It would be most natural to look for whether it is a sequence;
> unfortunately, strings are also sequences.

Sure, isinstance-tests with basestring are a fast and handy way to  
typetest that.

But so would similar tests with basenumber be.

>
>> isinstance with a tuple of number types, where the tuple did not
>> include Decimal (because when I developed and tested that module,
>> Decimal wasn't around yet).
>
> As I suggested in a different message: Why are you doing that
> in the first place?

Because isinstance is faster and handier than testing with try/except  
around (say) "x+0".

As to why I want to distinguish numbers from non-numbers, let me  
quote from a message I sent in 2003 (one of the few you'll find by  
searching for [basestring site:python.org] as I have repeatedly  
recommended, but apparently there's no way to avoid just massively  
copying and pasting...):

"""
def __mul__(self, other):
     if isinstance(other, self.KnownNumberTypes):
         return self.__class__([ x*other for x in self.items ])
     else:
         # etc etc, various other multiplication cases

right now, that (class, actually) attribute KnownNumberTypes starts out
"knowing" about int, long, float, gmpy.mpz, etc, and may require user
customization (e.g by subclassing) if any other "kind of (scalar)  
number"
needs to be supported; besides, the isinstance check must walk linearly
down the tuple of known number types each time.  (I originally had
quite a different test structure:
     try: other + 0
     except TypeError:  # other is not a number
         # various other multiplication cases
     else:
         # other is a number, so...
         return self.__class__([ x*other for x in self.items ])
but the performance for typical benchmarks improved with the isinstance
test, so, reluctantly, that's what I changed to).  If an abstract  
basetype
'basenumber' caught many useful cases, I'd put it right at the start of
the KnownNumberTypes tuple, omit all subclasses thereof from it, get
better performance, AND be able to document very simply what the user
must do to ensure his own custom type is known to me as "a number".
"""

Other use cases, still quoted from the very same message:

"""
in  Python/bltinmodule.c , function builtin_sum uses C-coded  
typechecking
to single out strings as an error case:

		/* reject string values for 'start' parameter */
		if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
			PyErr_SetString(PyExc_TypeError,
				"sum() can't sum strings [use ''.join(seq) instea

[etc].  Now, what builtin_sum really "wants" to do is to accept numbers,
only -- it's _documented_ as being meant for "numbers": it uses +, NOT
+=, so its performance on sequences, matrix and array-ish things, etc,
is not going to be good.  But -- it can't easily _test_ whether  
something
"is a number".  If we had a PyBaseNumber_Type to use here, it would
be smooth, easy, and fast to check for it.
"""

and yet another, which is the directly gmpy-related one:

"""
I see a few other cases in the standard library which want to treat  
"numbers"
in some specific way different from other types (often forgetting  
longs:-),
e.g. Lib/plat-mac/plistlib.py has one.  In gmpy, I would often like some
operations to be able to accept "a number", perhaps by letting it try to
transform itself into a float as a worst case (so complex numbers  
would fail
there), but I definitely do NOT want to accept non-number objects which
"happen to be able to return a value from float(x)", such as  
strings.  In all
such cases of wanting to check if something "is a number", an abstract
basetype might be handy, smooth, fast.
"""

>
>> I have the same issue in the C-coded extension gmpy: I want (e.g.) a
>> gmpy.mpq to be able to be constructed by passing any number as the
>> argument, but I have no good way to say "what's a number", so I use
>> rather dirty tricks -- in particular, I've had to tweak things in a
>> weird direction in the latest gmpy to accomodate Python 2.4
>> (specifically Decimal).
>
> Not sure what a gmpy.mpq is, but I would expect that can only work

A fast rational number type, see http://gmpy.sourceforge.net for  
details (gmpy wraps LGPL'd library GMP, and gets a lot of speed and  
functionality thereby).

> if the  parameter belongs to some algebraic ring homomorphic
> with the real numbers, or some such. Are complex numbers also numbers?
> Is it meaningful to construct gmpy.mpqs out of them? What about
> Z/nZ?

If I could easily detect "this is a number" about an argument x, I'd  
then ask x to change itself into a float, so complex would be easily  
rejected (while decimals would mostly work fine, although a bit  
slowly without some specialcasing, due to the Stern-Brocot-tree  
algorithm I use to build gmpy.mpq's from floats).  I can't JUST ask x  
to "make itself into a float" (without checking for x's "being a  
number") because that would wrongfully succeed for many cases such as  
strings.

>> If I do write the PEP, should it be just about basenumber, or should
>> it include baseinteger as well?
>
> I think it should only do the case you care about. If others have  
> other
> use cases, they might get integrated, or they might have to write
> another PEP.

Good idea.  I'll check around -- if anybody feels strongly about  
baseinteger they may choose to co-author with me (and the PEP will  
propose both), otherwise we'll hassle on basenumber only;-)

Alex