question about xrange performance

Fri Apr 17 23:09:18 EDT 2009

On Fri, 17 Apr 2009 13:58:54 -0700, ~flow wrote:

>> One might wonder why you are even writing code to test for existence
>> "in" a range list, when "blee <= blah < bloo" is obviously going to
>> outperform this kind of code.
>> -- Paul
> 
> the reason is simply the idiomacy and convenience. i use (x)xranges to
> implement unicode blocks and similar things. it is natural to write `if
> cid in latin_1` and so on.

[soapbox]
Speaking about idiomacy, it is grammatically incorrect to start sentences 
in English with lower-case letters, and it is rude because it harms the 
reading ability of people reading your posts. If it saves you 0.01ms of 
typing time to avoid pressing the shift key, and 100 people reading your 
post take 0.01ms more mental processing time to comprehend your writing 
because of the lack of a clear sentence break, then the harm you do to 
others is 100 times greater than the saving you make for yourself. You're 
not e.e. cummings, who was a dick anyway, and as a programmer you're 
supposed to understand about precision in language, syntax and grammar.
[end soapbox]

I think testing y in xrange() is a natural thing to do, but as I recall, 
it was actually removed from xrange a few years ago to simplify the code. 
I thought that was a silly decision, because the code was written and 
working and it's not like the xrange object was likely to change, but 
what do I know?

> i always assumed it would be about the
> fastest and memory-efficient to use xrange for this. 

If you don't need to iterate over the actual codepoints, the most memory-
efficient would be to just store the start and end positions, as a tuple 
or possibly even a slice object, and then call t[0] <= codepoint < t[1].

If you do need to iterate over them, perhaps some variant of this would 
suit your needs:

# Untested
class UnicodeBlock(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self._current = start
    def __contains__(self, value):
        if isinstance(value, (int, long)):
             return self.start <= value < self.end
    def __iter__(self):
        return self
    def next(self):
        if self._current < self.end:
            self._current += 1
            return self._current
        raise StopIterator
    def reset(self):
        self._current = self.start

[...]
> the `( x == int( x ) )` is not easily being done away with if emulation
> of present x/range behavior is desired (x.0 floats are in, all others
> are out).

x.0 floats working with xrange is an accident, not a deliberate design 
decision, and has been deprecated in Python 2.6, which means it will 
probably be gone in a few years:

>>> r = xrange(2.0, 5.0)
__main__:1: DeprecationWarning: integer argument expected, got float

-- 
Steven