question about xrange performance
MRAB
google at mrabarnett.plus.com
Fri Apr 17 17:47:52 EDT 2009
~flow wrote:
>> One might wonder why you are even writing code to test for existence
>> "in" a range list, when "blee <= blah < bloo" is obviously going to
>> outperform this kind of code.
>> -- Paul
>
> the reason is simply the idiomacy and convenience. i use (x)xranges to
> implement unicode blocks and similar things. it is natural to write
> `if cid in latin_1` and so on. i always assumed it would be about the
> fastest and memory-efficient to use xrange for this. i stumbled
> against a first wall, tho, when i realized that `xrange` does not
> accept long integers. so i wrote the class above to get an xrange-like
> object that would accept them. stepping was not of concern---i recall
> few times that i was in need of a third argument to xrange at all. i
> then added iteration when i needed it; shows that iteration over
> xxranges is outperformed by xrange by a factor of over twenty:
>
> class xxrange( object ):
> def __iter__( self ):
> i = self.start
> while i < self.stop:
> yield i
> i += 1
>
> containment
> xxrange 0.027 CPU seconds
> xrange 90.365 CPU seconds
> set 0.002 CPU seconds
>
> iteration
> xxrange 0.262 CPU seconds
> xrange 0.019 CPU seconds
> set 0.031 CPU seconds # iterated sorted list; tested as `for x
> in sorted(my_set)`
>
> however, these numbers also show still an incredible and unexpected
> ratio in favor of xxrange for containment; they also show that a set
> is (with 200'000 integer numbers) 10 times as fast for containment and
> are only 1.6 times slower than xxranges for iteration.
>
> this means that an optimized implemtation of x/range should choose
> between different implementations depending on size of collection,
> memory available, and purpose if they want to achieve better
> efficiency. the least of changes could be
>
> class xrange( object ):
> def __old_contains__:
> ...
> def __contains__( self ):
> if self step != 1:
> return self.__old_contains__()
> return ( x == int( x ) ) and self.start <= x < self.stop
>
> the `( x == int( x ) )` is not easily being done away with if
> emulation of present x/range behavior is desired (x.0 floats are in,
> all others are out).
>
> frankly speaking, i would like to say that finding out whether xrange
> is a good solution for containment tests will cause some time of
> reading or, of course, dedicated profiling; making Python make that
> choice might be a better idea.
>
> my guess would be that most xranges are in fact used with step 1, so
> even sorting out that single bottleneck would have noticable effect.
> now that xrange has taken over range in py3k people will get some
> bumps on the transition way anyhow. i for one submit to my fancy
> xxranges for the time.
>
> cheers and thanks for the explanations!
>
In that case, I suppose that you need some sort of 'rangeset', a
class which is optimised for sets which include ranges of values.
More information about the Python-list
mailing list