Where does str class represent its data?
ChrisEdgemon at gmail.com
ChrisEdgemon at gmail.com
Thu Jul 12 23:41:10 CEST 2007
On Jul 11, 9:49 pm, James Stroud <jstr... at mbi.ucla.edu> wrote:
> ChrisEdge... at gmail.com wrote:
> > I'd like to implement a subclass of string that works like this:
> >>>>m = MyString('mail')
> >>>>m == 'fail'
> > True
> >>>>m == 'mail'
> > False
> >>>>m in ['fail', hail']
> > True
> > My best attempt for something like this is:
> > class MyString(str):
> > def __init__(self, seq):
> > if self == self.clean(seq): pass
> > else: self = MyString(self.clean(seq))
> > def clean(self, seq):
> > seq = seq.replace("m", "f")
> > but this doesn't work. Nothing gets changed.
> > I understand that I could just remove the clean function from the
> > class and call it every time, but I use this class in several
> > locations, and I think it would be much safer to have it do the
> > cleaning itself.
> The "flat is better than nested" philosophy suggests that clean should
> be module level and you should initialize a MyString like such:
> m = MyString(clean(s))
> Where clean is
> def clean(astr):
> return astr.replace('m', 'f')
> Although it appears compulsory to call clean each time you instantiate
> MyString, note that you do it anyway when you check in your __init__.
> Here, you are explicit. Such an approach also eliminates the obligation
> to clean the string under conditions where you know it will already be
> clean--such as deserialization.
Initially, I tried simply calling a clean function on a regular
string, without any of this messy subclassing. However, I would end
up accidentally cleaning it more than once, and transforming the
string was just very messy. I thought that it would be much easier to
just clean the string once, and then add methods that would give me
the various transformations that I wanted from the cleaned string.
Using __new__ seems to be the solution I was looking for.
> Also, you don't return anything from clean above, so you assign None to
> self here:
> self = MyString(self.clean(seq))
> Additionally, it has been suggested that you use __new__. E.g.:
> py> class MyString(str):
> ... def __new__(cls, astr):
> ... astr = astr.replace('m', 'f')
> ... return super(MyString, cls).__new__(cls, astr)
> py> MyString('mail')
> But this is an abuse of the str class if you intend to populate your
> subclasses with self-modifying methods such as your clean method. In
> this case, you might consider composition, wherein you access an
> instance of str as an attribute of class instances. The python standard
> library make this easy with the UserString class and the ability to add
> custom methods to its subclasses:
What constitutes an abuse of the str class? Is there some performance
decrement that results from subclassing str like this? (Unfortunately
my implementation seems to have a pretty large memory footprint, 400mb
for about 400,000 files.) Or do you just mean from a philsophical
standpoint? I guess I don't understand what benefits come from using
UserString instead of just str.
Thanks for the help,
> py> from UserString import UserString as UserString
> py> class MyClass(UserString):
> ... def __init__(self, astr):
> ... self.data = self.clean(astr)
> ... def clean(self, astr):
> ... return astr.replace('m', 'f')
> py> MyClass('mail')
> py> type(_)
> <type 'instance'>
> This class is much slower than str, but you can always access an
> instance's data attribute directly if you want fast read-only behavior.
> py> astr = MyClass('mail').data
> py> astr
> But now you are back to a built-in type, which is actually the
> point--not everything needs to be in a class. This isn't java.
> James Stroud
> UCLA-DOE Institute for Genomics and Proteomics
> Box 951570
> Los Angeles, CA 90095
More information about the Python-list