question about xrange performance

MRAB google at mrabarnett.plus.com
Fri Apr 17 23:47:52 CEST 2009


~flow wrote:
>> One might wonder why you are even writing code to test for existence
>> "in" a range list, when "blee <= blah < bloo" is obviously going to
>> outperform this kind of code.
>> -- Paul
> 
> the reason is simply the idiomacy and convenience. i use (x)xranges to
> implement unicode blocks and similar things. it is natural to write
> `if cid in latin_1` and so on. i always assumed it would be about the
> fastest and memory-efficient to use xrange for this. i stumbled
> against a first wall, tho, when i realized that `xrange` does not
> accept long integers. so i wrote the class above to get an xrange-like
> object that would accept them. stepping was not of concern---i recall
> few times that i was in need of a third argument to xrange at all. i
> then added iteration when i needed it; shows that iteration over
> xxranges is outperformed by xrange by a factor of over twenty:
> 
> class xxrange( object ):
>   def __iter__( self ):
>     i = self.start
>     while i < self.stop:
>       yield i
>       i += 1
> 
> containment
> xxrange    0.027 CPU seconds
> xrange    90.365 CPU seconds
> set        0.002 CPU seconds
> 
> iteration
> xxrange    0.262 CPU seconds
> xrange     0.019 CPU seconds
> set        0.031 CPU seconds # iterated sorted list; tested as `for x
> in sorted(my_set)`
> 
> however, these numbers also show still an incredible and unexpected
> ratio in favor of xxrange for containment; they also show that a set
> is (with 200'000 integer numbers) 10 times as fast for containment and
> are only 1.6 times slower than xxranges for iteration.
> 
> this means that an optimized implemtation of x/range should choose
> between different implementations depending on size of collection,
> memory available, and purpose if they want to achieve better
> efficiency. the least of changes could be
> 
> class xrange( object ):
>   def __old_contains__:
>     ...
>   def __contains__( self ):
>     if self step != 1:
>       return self.__old_contains__()
>     return ( x == int( x ) ) and self.start <= x < self.stop
> 
> the `( x == int( x ) )` is not easily being done away with if
> emulation of present x/range behavior is desired (x.0 floats are in,
> all others are out).
> 
> frankly speaking, i would like to say that finding out whether xrange
> is a good solution for containment tests will cause some time of
> reading or, of course, dedicated profiling; making Python make that
> choice might be a better idea.
> 
> my guess would be that most xranges are in fact used with step 1, so
> even sorting out that single bottleneck would have noticable effect.
> now that xrange has taken over range in py3k people will get some
> bumps on the transition way anyhow. i for one submit to my fancy
> xxranges for the time.
> 
> cheers and thanks for the explanations!
> 
In that case, I suppose that you need some sort of 'rangeset', a
class which is optimised for sets which include ranges of values.



More information about the Python-list mailing list