Where does str class represent its data?

ChrisEdgemon at gmail.com ChrisEdgemon at gmail.com
Thu Jul 12 23:41:10 CEST 2007


On Jul 11, 9:49 pm, James Stroud <jstr... at mbi.ucla.edu> wrote:
> ChrisEdge... at gmail.com wrote:
> > I'd like to implement a subclass of string that works like this:
>
> >>>>m = MyString('mail')
> >>>>m == 'fail'
>
> > True
>
> >>>>m == 'mail'
>
> > False
>
> >>>>m in ['fail', hail']
>
> > True
>
> > My best attempt for something like this is:
>
> > class MyString(str):
> >   def __init__(self, seq):
> >     if self == self.clean(seq): pass
> >     else: self = MyString(self.clean(seq))
>
> >   def clean(self, seq):
> >     seq = seq.replace("m", "f")
>
> > but this doesn't work.  Nothing gets changed.
>
> > I understand that I could just remove the clean function from the
> > class and call it every time, but I use this class in several
> > locations, and I think it would be much safer to have it do the
> > cleaning itself.
>
> The "flat is better than nested" philosophy suggests that clean should
> be module level and you should initialize a MyString like such:
>
>    m = MyString(clean(s))
>
> Where clean is
>
>    def clean(astr):
>      return astr.replace('m', 'f')
>
> Although it appears compulsory to call clean each time you instantiate
> MyString, note that you do it anyway when you check in your __init__.
> Here, you are explicit. Such an approach also eliminates the obligation
> to clean the string under conditions where you know it will already be
> clean--such as deserialization.

Initially, I tried simply calling a clean function on a regular
string, without any of this messy subclassing.  However, I would end
up accidentally cleaning it more than once, and transforming the
string was just very messy.  I thought that it would be much easier to
just clean the string once, and then add methods that would give me
the various transformations that I wanted from the cleaned string.
Using __new__ seems to be the solution I was looking for.

>
> Also, you don't return anything from clean above, so you assign None to
> self here:
>
>     self = MyString(self.clean(seq))
>
> Additionally, it has been suggested that you use __new__. E.g.:
>
> py> class MyString(str):
> ...   def __new__(cls, astr):
> ...     astr = astr.replace('m', 'f')
> ...     return super(MyString, cls).__new__(cls, astr)
> ...
> py> MyString('mail')
> 'fail'
>
> But this is an abuse of the str class if you intend to populate your
> subclasses with self-modifying methods such as your clean method. In
> this case, you might consider composition, wherein you access an
> instance of str as an attribute of class instances. The python standard
> library make this easy with the UserString class and the ability to add
> custom methods to its subclasses:

What constitutes an abuse of the str class?  Is there some performance
decrement that results from subclassing str like this?  (Unfortunately
my implementation seems to have a pretty large memory footprint, 400mb
for about 400,000 files.) Or do you just mean from a philsophical
standpoint?  I guess I don't understand what benefits come from using
UserString instead of just str.

Thanks for the help,
Chris

>
> py> from UserString import UserString as UserString
> py> class MyClass(UserString):
> ...   def __init__(self, astr):
> ...     self.data = self.clean(astr)
> ...   def clean(self, astr):
> ...     return astr.replace('m', 'f')
> ...
> py> MyClass('mail')
> 'fail'
> py> type(_)
> <type 'instance'>
>
> This class is much slower than str, but you can always access an
> instance's data attribute directly if you want fast read-only behavior.
>
> py> astr = MyClass('mail').data
> py> astr
> 'fail'
>
> But now you are back to a built-in type, which is actually the
> point--not everything needs to be in a class. This isn't java.
>
> James
>
> --
> James Stroud
> UCLA-DOE Institute for Genomics and Proteomics
> Box 951570
> Los Angeles, CA 90095
>
> http://www.jamesstroud.com/





More information about the Python-list mailing list