[Tutor] subtyping builtin type

Steven D'Aprano steve at pearwood.info
Wed Jan 1 01:26:20 CET 2014


On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
> Hello,
> 
> I don't remember exactly how to do that. As an example:
> 
> class Source (str):
>     __slots__ = ['i', 'n']
>     def __init__ (self, string):
>         self.i = 0                  # current matching index in source
>         self.n = len(string)        # number of ucodes (Unicode code points)
>         #~ str.__init__(self, string)

The easiest way to do that is:

class Source(str):
    def __init__(self, *args, **kwargs):
        self.i = 0
        self.n = len(self)


As a (premature) memory optimization, you can use __slots__ to reduce 
the amount of memory per instance. But this (probably) is the wrong way 
to solve this problem. Your design makes Source a kind of string:

issubclass(Source, str) 
=> True

I expect that it should not be. (Obviously I'm making some assumptions 
about the design here.) To decide whether you should use subclassing 
here, ask yourself a few questions:

* Does it make sense to call string methods on Source objects? In 
  Python 3.3, there are over 40 public string methods. If *just one* 
  of them makes no sense for a Source object, then Source should not
  be a subclass of str.

  e.g. source.isnumeric(), source.isidentifier()

* Do you expect to pass Source objects to arbitrary functions which
  expect strings, and have the result be meaningful?

* Does it make sense for Source methods to return plain strings?
  source.upper() returns a str, not a Source object.

* Is a Source thing a kind of string? If so, what's the difference
  between a Source and a str? Why not just use a str?

  If all you want is to decorate a string with a couple of extra
  pieces of information, then a limitation of Python is that you 
  can only do so by subclassing.

* Or does a Source thing *include* a string as a component part of
  it? If that is the case -- and I think it is -- then composition
  is the right approach.


The difference between has-a and is-a relationships are critical. I 
expect that the right relationship should be:

    a Source object has a string

rather than "is a string". That makes composition a better design than 
inheritance. Here's a lightweight mutable solution, where all three 
attributes are public and free to be varied after initialisation:

class Source:
    def __init__(self, string, i=0, n=None):
        if n is None:
            n = len(string)
        self.i = i
        self.n = n
        self.string = string


An immutable solution is nearly as easy:

from collections import namedtuple

class Source(namedtuple("Source", "string i n")):
    def __new__(cls, string, i=0, n=None):
        if n is None: 
            n = len(string)
        return super(Source, cls).__new__(cls, string, i, n)


Here's a version which makes the string attribute immutable, and the i 
and n attributes mutable:

class Source:
    def __init__(self, string, i=0, n=None):
        if n is None:
            n = len(string)
        self.i = i
        self.n = n
        self._string = string
    @property
    def string(self):
        return self._string



-- 
Steven


More information about the Tutor mailing list