"/a" is not "/a" ?

Steven D'Aprano steve at pearwood.info
Fri Mar 6 15:37:51 EST 2009


Gary Herron wrote:

> Emanuele D'Arrigo wrote:
>> Hi everybody,
>>
>> while testing a module today I stumbled on something that I can work
>> around but I don't quite understand.
>>   
> 
> *Do NOT use "is" to compare immutable types.*    **Ever! **

Huh? How am I supposed to compare immutable types for identity then? Your
bizarre instruction would prohibit:

if something is None

which is the recommended way to compare to None, which is immutable. The
standard library has *many* identity tests to None.

I would say, *always* use "is" to compare any type whenever you intend to
compare by *identity* instead of equality. That's what it's for. If you use
it to test for equality, you're doing it wrong. But in the very rare cases
where you care about identity (and you almost never do), "is" is the
correct tool to use.


> It is an implementation choice (usually driven by efficiency
> considerations) to choose when two strings with the same value are stored
> in memory once or twice.  In order for Python to recognize when a newly
> created string has the same value as an already existing string, and so
> use the already existing value, it would need to search *every* existing
> string whenever a new string is created.

Not at all. It's quite easy, and efficient. Here's a pure Python string
constructor that caches strings.

class CachedString(str):
    _cache = {}
    def __new__(cls, value):
        s =  cls._cache.setdefault(value, value)
        return s
            
Python even includes a built-in function to do this: intern(), although I
believe it has been removed from Python 3.0.


> Clearly that's not going to be efficient. 

Only if you do it the inefficient way.

> However, the C implementation of Python does a limited version 
> of such a thing -- at least with strings of length 1.

No, that's not right. The identity test fails for some strings of length
one.

>>> a = '\n'
>>> b = '\n'
>>> len(a) == len(b) == 1
True
>>> a is b
False


Clearly, Python doesn't intern all strings of length one. What Python
actually interns are strings that look like, or could be, identifiers:

>>> a = 'heresareallylongstringthatisjustmade' \
... 'upofalphanumericcharacterssuitableforidentifiers123_'
>>>  
>>> b = 'heresareallylongstringthatisjustmade' \
... 'upofalphanumericcharacterssuitableforidentifiers123_'
>>> a is b
True

It also does a similar thing for small integers, currently something
like -10 through to 256 I believe, although this is an implementation
detail subject to change.


-- 
Steven




More information about the Python-list mailing list