Re: [Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

Dec. 10, 2001

      Tim Peters wrote:
...
[MAL]
...
Have you tried disabling all free list and using pymalloc
instead ?
No, but I haven't tried anything -- it's a 2.3 issue.
...
If this pays off, I agree, we should get rid off all of them.
When I do try it <wink>, it will be slower but more memory-efficient (both
data and code) than the type-specific free lists, and faster and much more
memory-efficient than using malloc().
Well, let's do some pybench runs next year and see what the results
look like.
...
...
...
I would consider moving from 8-bit strings to Unicode an
improvement in flexibility.
Sure.  Moving from one malloc to two is orthogonal.
You know that I know that you knew what I was talking about :-)
...
...
It also results in better algroithms (== simpler, less error-prone,
etc. in this case).
Unclear what "it" means; assuming it means using two mallocs instead of one
for a Unicode string object, the 8-bit string algorithms haven't been a
particular source of bugs.  People mutating strings at the C level has been.
If you ever try to support more than ASCII text in a user program,
you'll find that having to deal with only one encoding safes you
a whole lot of trouble. I won't even start talking about variable
length encodings, encodings with builtin shift state and other
goodies which are a complete nightmare to handle (e.g. various
character properties such as title case, upper/lower mappings,
different ways to encode a single character, collation,...).
...
...
As I said, it's a tradeoff flexibility vs. memory consumption.
Whether it pays off depends on your application environment. It
certainly does for companies like Micron and pays off stock-wise
for a lot of people... uhm, getting off-topic here :-)
I've got nothing against Unicode (apart from the larger issue that the whole
world would obviously be a lot better off if they switched to American
English <wink>).
I suppose Mandarin would reach a larger share in world 
population ... and they *need* Unicode :-)
...
...
...
Subclassing seems easy enough to me from the Python level; I
don't have time to revisit C-level subclasssing here (and I don't
know that it's hackish there either, but do think it's in need of
docs).
...
It is beautifully easy for non-varying-length types. Unfortunately,
it happens that some of the basic types which would be attractive
for subclassing are varying length types (such as string and
tuples).
It's easy to subclass from str and tuple in Python -- even to add your own
instance data.
Yeah, but that's not the point. I want to do this in C...
...
...
In my case, I'm looking for away to subclass strings, but I haven't
yet found an elegant solution to the problem of adding extra
data to the instances.
It's easy if you're willing to use a dict:
I would be willing to use a dictionary. It's only that even the
dictionary trick doesn't seem to work at C level.
...
class STR(str):
     def __new__(cls, strguts, n):
         self = str.__new__(cls, strguts)
         self.n = n
         return self
s = STR('abc', 42)
print s    # abc
print s.n  # 42
__slots__ doesn't work here, though.
I admit I personally don't see much attraction to subclassing from str and
tuple, apart from adding additional *methods*.  I suppose someone could code
up two-malloc variants ...
If you look at mxURL you'll find an extension type which tries
to play nice with strings -- it would be a good candidate for
a string subtype.

A string type which carries along an encoding attribute would be
another good candidate for a string subtype.

Both need extra attributes/data fields.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/

Re: [Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

M.-A. Lemburg