[Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

M.-A. Lemburg mal@lemburg.com
Mon, 10 Dec 2001 11:57:10 +0100

Tim Peters wrote:
> [MAL]
> > Have you tried disabling all free list and using pymalloc
> > instead ?
> No, but I haven't tried anything -- it's a 2.3 issue.
> > If this pays off, I agree, we should get rid off all of them.
> When I do try it <wink>, it will be slower but more memory-efficient (both
> data and code) than the type-specific free lists, and faster and much more
> memory-efficient than using malloc().

Well, let's do some pybench runs next year and see what the results
look like.
> > ...
> > I would consider moving from 8-bit strings to Unicode an
> > improvement in flexibility.
> Sure.  Moving from one malloc to two is orthogonal.

You know that I know that you knew what I was talking about :-)
> > It also results in better algroithms (== simpler, less error-prone,
> > etc. in this case).
> Unclear what "it" means; assuming it means using two mallocs instead of one
> for a Unicode string object, the 8-bit string algorithms haven't been a
> particular source of bugs.  People mutating strings at the C level has been.

If you ever try to support more than ASCII text in a user program,
you'll find that having to deal with only one encoding safes you
a whole lot of trouble. I won't even start talking about variable
length encodings, encodings with builtin shift state and other
goodies which are a complete nightmare to handle (e.g. various
character properties such as title case, upper/lower mappings,
different ways to encode a single character, collation,...).
> > As I said, it's a tradeoff flexibility vs. memory consumption.
> > Whether it pays off depends on your application environment. It
> > certainly does for companies like Micron and pays off stock-wise
> > for a lot of people... uhm, getting off-topic here :-)
> I've got nothing against Unicode (apart from the larger issue that the whole
> world would obviously be a lot better off if they switched to American
> English <wink>).

I suppose Mandarin would reach a larger share in world 
population ... and they *need* Unicode :-)
> >> Subclassing seems easy enough to me from the Python level; I
> >> don't have time to revisit C-level subclasssing here (and I don't
> >> know that it's hackish there either, but do think it's in need of
> >> docs).
> > It is beautifully easy for non-varying-length types. Unfortunately,
> > it happens that some of the basic types which would be attractive
> > for subclassing are varying length types (such as string and
> > tuples).
> It's easy to subclass from str and tuple in Python -- even to add your own
> instance data.

Yeah, but that's not the point. I want to do this in C...
> > In my case, I'm looking for away to subclass strings, but I haven't
> > yet found an elegant solution to the problem of adding extra
> > data to the instances.
> It's easy if you're willing to use a dict:

I would be willing to use a dictionary. It's only that even the
dictionary trick doesn't seem to work at C level.
> class STR(str):
>      def __new__(cls, strguts, n):
>          self = str.__new__(cls, strguts)
>          self.n = n
>          return self
> s = STR('abc', 42)
> print s    # abc
> print s.n  # 42
> __slots__ doesn't work here, though.
> I admit I personally don't see much attraction to subclassing from str and
> tuple, apart from adding additional *methods*.  I suppose someone could code
> up two-malloc variants ...

If you look at mxURL you'll find an extension type which tries
to play nice with strings -- it would be a good candidate for
a string subtype.

A string type which carries along an encoding attribute would be
another good candidate for a string subtype.

Both need extra attributes/data fields.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/