New restrain builtin function?

Sun Mar 21 17:05:06 EST 2004

I wrote:
> > I think if I were to write my own version today in pure Python,
> > it'd be invoked as: limit(object, lower, upper), where 
> <object> could be
> > an iterable. I see that as a much more common use case; 
> that is, mapping
> > the same pair of boundaries to multiple items.

and Pierre Rouleau replied:
> Interesting.  I had a simple one written in pure Python that was just 
> checking the type of object to determine if it was a sequence
> Agreed.  I had writte a pure Python one where I was looking for the 
> object type with type(object).
> 
> def restrain(theObject, lowerBound, upperBound):
>      if type(theObject) in [types.ListType, types.TupleType]:
>          return map(lambda val,theMin,theMax: 
> min(max(val,theMin),theMax) , theObject, lowerBound, upperBound)
>      else:
>          return  min(max(theObject, lowerBound), upperBound)
> 
> However,
> 
> 1) I don't like using type(theObject) to check if it is a sequence 
> because that does not allow iterable objects.
> 2) It does not allow the use case you just mentionned where the same 
> boundary is applied to every member set.
> 
> 
> So I wonder:
> 
> 1) what's the easiest way to find if an object is iterable?  
> There is no 
> 'isiterable()' built in.  Would it be to try iterate over it and cath 
> the potential exception or is there a better way?

First, rather than import the "types" module, you can now use
isinstance:

def restrain(theObject, lowerBound, upperBound):
     if isinstance(theObject, (list, tuple)):

...which is much cleaner.

Second, there's the question of "what is an iterable"? Some solutions:

1. Several contexts use type() or isinstance() as above to determine
this; however, you're limited in those situations to list, tuple, and
subclasses of them.

2. Some have recommended a try/except format, like:
    try:
        for a in obj:
            treat_as_instance(a)
    except TypeError:
        treat_as_instance(obj)

    However, this suffers from the fact that "for a in obj" works on
strings, which isn't always the desired behavior.

3. Check for the presence of the __iter__ method:
    if hasattr(obj, '__iter__'):
        treat_as_iterable(obj)
    else:
        treat_as_instance(obj)

    ...which, as we can see from the following, leaves out strings, but
catches classes which "emulate container types":

>>> class C(object):
...   bunch = []
... 	def __iter__(self):
... 		return iter(self.bunch)
... 	
>>> for thing in [], (), C(), '':
... 	if hasattr(thing, '__iter__'):
... 		print thing
... 		
[]
()
<__main__.C object at 0x011599D0>

...whichever route you take, it should be documented, especially the
results of passing a string.

Third, I'd write limit() to take an optional iterable first argument and
keep the upper and lower fixed for each item in an iterable as:

def limit4(obj, lower=None, upper=None):
    if hasattr(obj, '__iter__'):
        product = []
        for item in obj:
            if upper is not None:
                if item > upper:
                    item = upper
            if lower is not None:
                if item < lower:
                    item = lower
            product.append(item)
        return product
    else:
        if upper is not None:
            if obj > upper:
                obj = upper
        if lower is not None:
            if obj < lower:
                obj = lower
        return obj

Hard-coding less-than and greater-than takes 2/3 the time of min() and
max(). It's ugly having the same code block (the actual comparisons) in
there twice, but the performance boost is significant over having
another function call. One *could* write:

    if hasattr(obj, '__iter__'):
        obj = iter(obj)
    product = []
    for item in obj:
        if upper is not None:
            if item > upper:
                item = upper
        if lower is not None:
            if item < lower:
                item = lower
        product.append(item)
    return product    # ?? always return list?

...but then you have to either: a) return a list even if a list were not
passed in, or b) check __iter__ again and unpack the list at the end of
the function. Neither is very appealing.

If you had a dataset with many large iterables, you would probably go
ahead and incur the overhead of an inner function call, with something
like:

def limit3(obj, lower=None, upper=None):
    if upper is None:
        if lower is None:
            limfunc = lambda i: i
        else:
            def _lower(i):
                if i < lower:
                    i = lower
                return i
            limfunc = _lower
    else:
        if lower is None:
            def _upper(i):
                if i > upper:
                    i = upper
                return i
            limfunc = _upper
        else:
            def _both(i):
                if i < lower:
                    i = lower
                if i > upper:
                    i = upper
                return i
            limfunc = _both

    if hasattr(obj, '__iter__'):
        product = []
        for item in obj:
            product.append(limfunc(item))
        return product
    else:
        return limfunc(obj)

...which might squeeze a bit of performance out of the partial
application:

>>> tl = timeit.Timer("limit.limit(data, 1, 5)", "import limit; data =
(0, 3, 6) * 100")
>>> tl.timeit(10000)
7.1556577721471513
>>> t3l = timeit.Timer("limit.limit3(data, 1, 5)", "import limit; data =
(0, 3, 6) * 100")
>>> t3l.timeit(10000)
5.0156802051656086

The next optimization would be determining the performance breakpoint
between limit and limit3, and coding a length check on the first
argument to see which implementation to apply. Then factor the hit of
that length check into your speed calculations, of course. ;) I'm not
much for big-O analysis, as you can see. :D

Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org