[Python-Dev] basenumber redux
Alex Martelli
aleaxit at gmail.com
Tue Jan 17 03:18:10 CET 2006
On Jan 16, 2006, at 2:01 PM, Martin v. Löwis wrote:
> Alex Martelli wrote:
>> I can't find a PEP describing this restriction of basestring, and I
>> don't see why a coder who needs to implement another kind of
>> character string shouldn't subclass basestring, so that those
>> instances pass an isinstance test on basestring which is quite likely
>> to be found e.g. in the standard library.
>
> People could do that, but they would be on their own. basestring
> could be interpreted as "immutable sequence of characterish things",
> but it was really meant to be "union of str and unicode". There
> is no PEP because it was introduced by BDFL pronouncement.
Unfortunately the lack of a PEP leaves this a bit underdocumented.
> That it is not a generic base class for stringish types can be
> seen by looking at UserString.UserString, which doesn't inherit
> from basestring.
Raymond Hettinger's reason for not wanting to add basestring as a
base for UserString was: UserString is a legacy class, since today
people can inherit from str directly, and it cannot be changed from a
classic class to a new-style one without breaking backwards
compatibility, which for a legacy class would be a big booboo.
Nothing was said about "different design intent for basestring", as I
recall (that discussion comes up among the few hits for [basestring
site:python.org] if you want to check the details).
> For most practical purposes, those two definitions actually
> define the same thing - there simply aren't any stringish data
> types in praxis: you always have Unicode, and if you don't,
> you have bytes.
But not necessarily in one big blob that's consecutive (==compact) in
memory. mmap instances are "almost" strings and could easily be made
into a closer match, at least for the immutable variants, for
example; other implementations such as SGI STL's "Rope" also come to
mind.
In the context of a current struggle (a different and long story)
between Python builds with 2-bytes Unicode and ones with 4-bytes
Unicode, I've sometimes found myself dreaming of a string type that's
GUARANTEED to be 2-bytes, say, and against which extension modules
could be written that don't need recompilation to move among such
different builds, for example. It's not (yet) hurting enough to make
me hunker down and write such an extension (presumably mostly by copy-
past-edit from Python's own sources;-), but if somebody did it would
sure be nice if they could have that type "assert it's a string" by
inheriting from basestring, no?
>> Implementing different kinds of numbers is more likely than
>> implementing different kinds of strings, of course.
>
> Right. That's why a PEP is needed here, but not for basestring.
OK, I've mailed requesting a number.
>
>> A third argument against it is asymmetry: why should I use completely
>> different approaches to check if x is "some kind of string", vs
>> checking if x is "some kind of number"?
>
> I guess that's for practicality which beats purity. People often
> support interfaces that either accept both an individual string
> and a list of strings, and they need the test in that case.
> It would be most natural to look for whether it is a sequence;
> unfortunately, strings are also sequences.
Sure, isinstance-tests with basestring are a fast and handy way to
typetest that.
But so would similar tests with basenumber be.
>
>> isinstance with a tuple of number types, where the tuple did not
>> include Decimal (because when I developed and tested that module,
>> Decimal wasn't around yet).
>
> As I suggested in a different message: Why are you doing that
> in the first place?
Because isinstance is faster and handier than testing with try/except
around (say) "x+0".
As to why I want to distinguish numbers from non-numbers, let me
quote from a message I sent in 2003 (one of the few you'll find by
searching for [basestring site:python.org] as I have repeatedly
recommended, but apparently there's no way to avoid just massively
copying and pasting...):
"""
def __mul__(self, other):
if isinstance(other, self.KnownNumberTypes):
return self.__class__([ x*other for x in self.items ])
else:
# etc etc, various other multiplication cases
right now, that (class, actually) attribute KnownNumberTypes starts out
"knowing" about int, long, float, gmpy.mpz, etc, and may require user
customization (e.g by subclassing) if any other "kind of (scalar)
number"
needs to be supported; besides, the isinstance check must walk linearly
down the tuple of known number types each time. (I originally had
quite a different test structure:
try: other + 0
except TypeError: # other is not a number
# various other multiplication cases
else:
# other is a number, so...
return self.__class__([ x*other for x in self.items ])
but the performance for typical benchmarks improved with the isinstance
test, so, reluctantly, that's what I changed to). If an abstract
basetype
'basenumber' caught many useful cases, I'd put it right at the start of
the KnownNumberTypes tuple, omit all subclasses thereof from it, get
better performance, AND be able to document very simply what the user
must do to ensure his own custom type is known to me as "a number".
"""
Other use cases, still quoted from the very same message:
"""
in Python/bltinmodule.c , function builtin_sum uses C-coded
typechecking
to single out strings as an error case:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instea
[etc]. Now, what builtin_sum really "wants" to do is to accept numbers,
only -- it's _documented_ as being meant for "numbers": it uses +, NOT
+=, so its performance on sequences, matrix and array-ish things, etc,
is not going to be good. But -- it can't easily _test_ whether
something
"is a number". If we had a PyBaseNumber_Type to use here, it would
be smooth, easy, and fast to check for it.
"""
and yet another, which is the directly gmpy-related one:
"""
I see a few other cases in the standard library which want to treat
"numbers"
in some specific way different from other types (often forgetting
longs:-),
e.g. Lib/plat-mac/plistlib.py has one. In gmpy, I would often like some
operations to be able to accept "a number", perhaps by letting it try to
transform itself into a float as a worst case (so complex numbers
would fail
there), but I definitely do NOT want to accept non-number objects which
"happen to be able to return a value from float(x)", such as
strings. In all
such cases of wanting to check if something "is a number", an abstract
basetype might be handy, smooth, fast.
"""
>
>> I have the same issue in the C-coded extension gmpy: I want (e.g.) a
>> gmpy.mpq to be able to be constructed by passing any number as the
>> argument, but I have no good way to say "what's a number", so I use
>> rather dirty tricks -- in particular, I've had to tweak things in a
>> weird direction in the latest gmpy to accomodate Python 2.4
>> (specifically Decimal).
>
> Not sure what a gmpy.mpq is, but I would expect that can only work
A fast rational number type, see http://gmpy.sourceforge.net for
details (gmpy wraps LGPL'd library GMP, and gets a lot of speed and
functionality thereby).
> if the parameter belongs to some algebraic ring homomorphic
> with the real numbers, or some such. Are complex numbers also numbers?
> Is it meaningful to construct gmpy.mpqs out of them? What about
> Z/nZ?
If I could easily detect "this is a number" about an argument x, I'd
then ask x to change itself into a float, so complex would be easily
rejected (while decimals would mostly work fine, although a bit
slowly without some specialcasing, due to the Stern-Brocot-tree
algorithm I use to build gmpy.mpq's from floats). I can't JUST ask x
to "make itself into a float" (without checking for x's "being a
number") because that would wrongfully succeed for many cases such as
strings.
>> If I do write the PEP, should it be just about basenumber, or should
>> it include baseinteger as well?
>
> I think it should only do the case you care about. If others have
> other
> use cases, they might get integrated, or they might have to write
> another PEP.
Good idea. I'll check around -- if anybody feels strongly about
baseinteger they may choose to co-author with me (and the PEP will
propose both), otherwise we'll hassle on basenumber only;-)
Alex
More information about the Python-Dev
mailing list