
Tim Peters wrote:
[MAL]
Have you tried disabling all free list and using pymalloc instead ?
No, but I haven't tried anything -- it's a 2.3 issue.
If this pays off, I agree, we should get rid off all of them.
When I do try it <wink>, it will be slower but more memory-efficient (both data and code) than the type-specific free lists, and faster and much more memory-efficient than using malloc().
Well, let's do some pybench runs next year and see what the results look like.
... I would consider moving from 8-bit strings to Unicode an improvement in flexibility.
Sure. Moving from one malloc to two is orthogonal.
You know that I know that you knew what I was talking about :-)
It also results in better algroithms (== simpler, less error-prone, etc. in this case).
Unclear what "it" means; assuming it means using two mallocs instead of one for a Unicode string object, the 8-bit string algorithms haven't been a particular source of bugs. People mutating strings at the C level has been.
If you ever try to support more than ASCII text in a user program, you'll find that having to deal with only one encoding safes you a whole lot of trouble. I won't even start talking about variable length encodings, encodings with builtin shift state and other goodies which are a complete nightmare to handle (e.g. various character properties such as title case, upper/lower mappings, different ways to encode a single character, collation,...).
As I said, it's a tradeoff flexibility vs. memory consumption. Whether it pays off depends on your application environment. It certainly does for companies like Micron and pays off stock-wise for a lot of people... uhm, getting off-topic here :-)
I've got nothing against Unicode (apart from the larger issue that the whole world would obviously be a lot better off if they switched to American English <wink>).
I suppose Mandarin would reach a larger share in world population ... and they *need* Unicode :-)
Subclassing seems easy enough to me from the Python level; I don't have time to revisit C-level subclasssing here (and I don't know that it's hackish there either, but do think it's in need of docs).
It is beautifully easy for non-varying-length types. Unfortunately, it happens that some of the basic types which would be attractive for subclassing are varying length types (such as string and tuples).
It's easy to subclass from str and tuple in Python -- even to add your own instance data.
Yeah, but that's not the point. I want to do this in C...
In my case, I'm looking for away to subclass strings, but I haven't yet found an elegant solution to the problem of adding extra data to the instances.
It's easy if you're willing to use a dict:
I would be willing to use a dictionary. It's only that even the dictionary trick doesn't seem to work at C level.
class STR(str): def __new__(cls, strguts, n): self = str.__new__(cls, strguts) self.n = n return self
s = STR('abc', 42) print s # abc print s.n # 42
__slots__ doesn't work here, though.
I admit I personally don't see much attraction to subclassing from str and tuple, apart from adding additional *methods*. I suppose someone could code up two-malloc variants ...
If you look at mxURL you'll find an extension type which tries to play nice with strings -- it would be a good candidate for a string subtype. A string type which carries along an encoding attribute would be another good candidate for a string subtype. Both need extra attributes/data fields. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/