[Tutor] subclassing strings

Kent Johnson kent37 at tds.net
Wed Jan 9 04:49:59 CET 2008


Eric Abrahamsen wrote:
> When I create a string like so:
> 
> x = 'myvalue'
> 
> my understanding is that this is equivalent to:
> 
> x = str('myvalue')
> 
> and that this second form is more fundamental: the first is a  
> shorthand for the second. 

The second does nothing that the first doesn't already do.

'myvalue' is a string:
In [4]: s='myvalue'
In [5]: type(s)
Out[5]: <type 'str'>

So is str('myvalue'):
In [6]: t=str(s)
In [7]: type(t)
Out[7]: <type 'str'>

In fact they are the *same* string - str(s) is the same as s if s is 
already a string:
In [8]: s is t
Out[8]: True

What is 'str()' exactly? Is it a class name?

Close; str is a type name. str() is an invocation of the type.

> If so, is the string value I pass in assigned to an attribute, the way  
> I might create a "self.value =" statement in the __init__ function of  
> a class I made myself? If so, does that interior attribute have a  
> name? I've gone poking in the python lib, but haven't found anything  
> enlightening.

No, not really. At the C level, IIUC there is a structure containing a 
pointer to a byte array, but there is no access to this level of 
internals from Python. For Python, strings are fundamental types like 
integers and floats. The internal representation is not available.

I guess you may have a background in C++ where a char array is different 
from an instance of the string class. Python does not have this 
distinction; you don't have access to a bare char array that is not 
wrapped in some class.

> I started out wanting to subclass str so I could add metadata to  
> objects which would otherwise behave exactly like strings. But then I  
> started wondering where the actual value of the string was stored,  
> since I wasn't doing it myself, and whether I'd need to be careful of  
> __repr__ and __str__ so as not to interfere with the basic string  
> functioning of the object. As far as I can tell the object functions  
> normally as a string without my doing anything – where does the string  
> value 'go', and is there any way I might inadvertently step on it by  
> overriding the wrong attribute or method?

No, you can't access the actual byte array from Python and you can't 
damage it.

You might want to take a look at BeautifulSoup, which subclasses unicode 
to create a page element, and path.py which subclasses string to add 
file path manipulation operations.
http://www.crummy.com/software/BeautifulSoup/
file://localhost/Users/kent/Desktop/Downloads/Python/path-2.1/index.html

The actual string object implementation is in stringobject.h & .c:
http://svn.python.org/view/python/trunk/Include/stringobject.h?rev=59564&view=markup
http://svn.python.org/view/python/trunk/Objects/stringobject.c?rev=59564&view=markup

Kent


More information about the Tutor mailing list