[Tutor] subtyping builtin type
spir
denis.spir at gmail.com
Wed Jan 1 14:49:17 CET 2014
On 01/01/2014 01:26 AM, Steven D'Aprano wrote:
> On Tue, Dec 31, 2013 at 03:35:55PM +0100, spir wrote:
>> Hello,
>>
>> I don't remember exactly how to do that. As an example:
>>
>> class Source (str):
>> __slots__ = ['i', 'n']
>> def __init__ (self, string):
>> self.i = 0 # current matching index in source
>> self.n = len(string) # number of ucodes (Unicode code points)
>> #~ str.__init__(self, string)
>
> The easiest way to do that is:
>
> class Source(str):
> def __init__(self, *args, **kwargs):
> self.i = 0
> self.n = len(self)
Thank you Steven for your help.
Well, I don't really get everything you say below, about possible alternatives,
so I'll give a bit more details. The only point of Source is to have a string
storing current index, somewhat like file (being read) on the filesystem. I take
the opportunity to add a few features, but would do without Source altogether if
it were not for 'i'.
The reason is: it is for parsing library, or hand-made parsers. Every matching
func, representing a pattern (or "rule"), advances in source whenever mathc is
ok, right? Thus in addition to return the form (of what was matched), they must
return the new match index:
return (form, i)
Symmetrically, every match func using another (meaning nearly all) receive this
pair. (Less annoyingly, every math func also takes i as input, in addition to
the src str.) (There are also a handful of other annoying points, consequences
of those ones.)
If I have a string that stores its index, all of this mess is gone. It makes for
clean and simple interfaces everywhere. Also (one of the consequences) I can
directly provide match funcs to the user, instead of having to wrap them inside
a func which only utility is to hide the additional index (in both input & output).
> As a (premature) memory optimization, you can use __slots__ to reduce
> the amount of memory per instance.
You're right! (I did it in fact for 'Form' subtypes, representing match results
which are constantly instanciated, possibly millions of times in a single parse;
but on the way i did it to Source as well, which is stupid ;-)
>But this (probably) is the wrong way
> to solve this problem. Your design makes Source a kind of string:
>
> issubclass(Source, str)
> => True
>
> I expect that it should not be. (Obviously I'm making some assumptions
> about the design here.)
Actually, doesn't matter whether issubclass or isinstance are true. But it must
be a subtype to use string methods (including magic ones like slicing), as you
say below.
> To decide whether you should use subclassing
> here, ask yourself a few questions:
>
> * Does it make sense to call string methods on Source objects? In
> Python 3.3, there are over 40 public string methods. If *just one*
> of them makes no sense for a Source object, then Source should not
> be a subclass of str.
> e.g. source.isnumeric(), source.isidentifier()
Do you really mean "If *just one* of them makes no sense for a Source object,
then Source should not be a subclass of str." ? Or should I understand "If *only
one* of them does make sense for a Source object, then Source should not be a
subclass of str." ?
Also, why? or rather why not make it a subtyp if I only use one method?
Actually, a handful of them are intensely used (indexing, slicing, the series of
is* [eg isalnum], a few more as the prject moves on). This is far enough for me
to make it a subtype.
Also, it fits semantically (conceptualy): a src is a str, that just happens to
store a current index.
> * Do you expect to pass Source objects to arbitrary functions which
> expect strings, and have the result be meaningful?
No, apart from string methods themselves. It's all internal to the lib.
> * Does it make sense for Source methods to return plain strings?
> source.upper() returns a str, not a Source object.
Doesn't matter (it's parsing). The result Forms, when they hold snippets, hold
plain strings, not Source's, thus all is fine.
> * Is a Source thing a kind of string? If so, what's the difference
> between a Source and a str? Why not just use a str?
see above
> If all you want is to decorate a string with a couple of extra
> pieces of information, then a limitation of Python is that you
> can only do so by subclassing.
That's it. But I don't know of any other solution in other langs, apart from
composition, which in my view is clearly inferior:
* it does not fit semantics (conception)
* it's annoying syntactically (constant attribute access)
> * Or does a Source thing *include* a string as a component part of
> it? If that is the case -- and I think it is -- then composition
> is the right approach.
No, a source is conceptually like a string, not a kind of composite object with
a string among other fields. (Again, think at a file.)
> The difference between has-a and is-a relationships are critical. I
> expect that the right relationship should be:
>
> a Source object has a string
>
> rather than "is a string". That makes composition a better design than
> inheritance. Here's a lightweight mutable solution, where all three
> attributes are public and free to be varied after initialisation:
No, see above.
> class Source:
> def __init__(self, string, i=0, n=None):
> if n is None:
> n = len(string)
> self.i = i
> self.n = n
> self.string = string
Wrong solution for my case.
> An immutable solution is nearly as easy:
>
> from collections import namedtuple
>
> class Source(namedtuple("Source", "string i n")):
> def __new__(cls, string, i=0, n=None):
> if n is None:
> n = len(string)
> return super(Source, cls).__new__(cls, string, i, n)
An immutable version is fine. But what does this version bring me? a Source's
code-string is immutable already. 'i' does change.
> Here's a version which makes the string attribute immutable, and the i
> and n attributes mutable:
>
> class Source:
> def __init__(self, string, i=0, n=None):
> if n is None:
> n = len(string)
> self.i = i
> self.n = n
> self._string = string
> @property
> def string(self):
> return self._string
Again, what is here better than a plain subtyping of type 'str'? (And I dislike
the principle of properties; i want to know whether it's a func call or plain
attr access, on the user side. Bertrand Meyer's "uniform access principle" for
Eiffel is what I dislike most in this lang ;-) [which has otherwise much to offer].)
Seems I have more to learn ;-) great!
Side-note: after reflexion, I guess I'll get rid of 'n'. 'n' is used each time I
need in match funcs to check for end-of-source (meaning, in every low-level,
lexical pattern, the ones that actually "eat" portions of source). I defined 'n'
to have it at hand, but now I wonder whether it's not in fact less efficient
than just writing len(src) instead of src.n, everywhere. (Since indeed python
strings hold their length: it's certainly not an actual func call! Python lies ;-)
Denis
More information about the Tutor
mailing list