Gradual migration

Python 3K. It is the repository for our hopes and dreams. We tend to invoke it in three different situations: 1. in delaying discussions of gee-whiz features (e.g. static type checking) 2. in delaying hairy implementation re-factoring that we don't want to undertake right now. 3. in delaying painful backwards-compatibility breakage I think it is somewhat debatable whether we really need to or should do these three things all at once but that's a separate discussion for another day. (the other experiment may inform our decision) I want to focus on "3" -- breakage. Rather than delaying painful backwards-compatibility breakage I thing we should make it less painful. We should decide where we are going long before we ask our users to move there. I think we should start including alternatives to syntaxes and features that we know are broken. Once people move to the alternatives we can "reassign" or remove the original syntax with much less pain. In other words, rather than telling people "here's a new version, your code is broken, sorry." We should tell them: "we're going to break code. Here's an alternative syntax that you can use that will be interpreted the same in the old and new versions -- please move to this new syntax as quickly as possible." I'll outline some examples of the strategy. You may or may not like details of the particular proposals but you might still agree with the strategy. Also, I don't claim that all of the proposals are fully fleshed-out...again, it's the strategy I'm most interested in. I don't even agree with every feature change I describe below -- they are just some I've heard of for Python 3000. In other words ** this is not a design document for Python 3000! ** Separating byte arrays from strings: 1. immediately introduce two new functions: binopen("foo").read() -> byte array stropen("foo","UTF-8".read() -> u"...." 2. warn about string literals that have embedded non-Unicode characters 3. deprecate extension modules that return "old fashioned" string arrays 4. after a period where all strings have been restricted to Unicode-compatibility, merge the Unicode and string types. 5. deprecate the special Unicode u"" syntax as an imperialist anachronism Reclaiming the division operator from truncating integer division: 1. immediately introduce new functions: div() that does division as we wish it was. 2. add a warning mode to Python (long overdue) 3. with the warning mode on, old-fashioned division triggers a deprecation warning. 4. after three years as a warning situation we expect all in-use Python scripts to have been upgraded to use the new operations and to explicitly truncate integer division when that is what is wanted. 5. at that point we can re-assign the division operator to be a floating point division if we wish. Case insensitivity: 1. Warn whenever a module or class has two __dict__ entries that differ only by case 2. Eventually, disallow that form of name-clash altogether 3. After a period, allow case variations to be equivalent. Unifying types and classes (more vague): 1. Add something like extension class to make type subclassing easier. 2. Informally deprecate modules that do not incorporate it. 3. Replace or fix details of the language and implementation that behave differently for types and classes. (e.g. the type() operator) -- Paul Prescod - Not encumbered by ActiveState consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Hi, Paul Prescod wrote on python-dev@python.org: [...]
Reclaiming the division operator from truncating integer division: [...]
Bruce Sherwood and David Scherer suggested a IMHO very clever solution to this particular problem in March/April this year. This thread was first posted to idle-dev and afterwards X-posted to python-dev: http://www.python.org/pipermail/idle-dev/2000-April/000138.html http://www.python.org/pipermail/idle-dev/2000-April/000140.html http://www.python.org/pipermail/python-dev/2000-April/010029.html Unfortunately I'm no native english speaker and have not enough time to make a Python-2.1 PEP from it. May be somebody else wants to go for it? I still believe, this was a very clever Python enhancement, which should be officially proposed. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)

Peter Funk wrote:
Tim's last message on it doesn't sound overwhelmingly positiive to me! Also, don't forget that Python is also a command line so "version pragmas" cause additional complexity there. -- Paul Prescod Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Paul Prescod wrote:
I think will simply be a consequence of doing a complete rewrite of the interpreter for Py3K. AFAIR, the only truely feasable solution would be doing the rewrite in a widely portable subset of C++ and then managing classes at that level. Moving to a common and very low-level strategy for classes will allows us to put even the most basic types (strings and numbers) into an inheritence tree. Differences like the ones between Unicode and 8-bit strings would then flesh out as two different subclasses of a generic string type which again is based on a generic sequence type. The same could be done for dictionaries: special ones for just string keys, case insensitive lookups, etc. could all be subclasses of a generic mapping class. Dito for numbers (and division strategies). By following this principle there won't be all that much breakage, since the old functionality will still be around, only the defaults will have changed. Add to this pluggable compilers and ceval loops, plus a nice way of configuring the lot on a per-module basis and you're set. (Ok, it's a fluffy clouds image, but you get the picture ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Marc-Andre Lemburg replied:
Good job in channeling me, Marc-Andre! I'm sure that's not exactly how it's going to be, but on the face of it, this sure sounds like a reasonable possible route. Do you want to be the author for PEP-3000? --Guido van Rossum (home page: http://www.python.org/~guido/)

"M.-A. Lemburg" wrote:
I disagree for a variety of reasons: * implementation language and Python inheritance semantics are almost completely distinct. After all, we have Python implementations in a variety of languages with minor incompatibilities. Python could have a proper type/class merging even if it were written in assembly language. So let's leave implementation language aside. * there is a hell of a lot we can do to make the type/class split less visible without a major rewrite. For instance, lists could have a __getitem__ method so that they "act like" instances. Instances could return their methods in a dir() so that they "act like" built-in objects. So there is no reason to wait for a complete rewrite to start on this path. * It may even be the case that we can get from here to complete merged type/class semantics WITHOUT a rewrite. If a mad genious had not written stackless Python I would have sworn up and down that stackless Python would require a Python rewrite. It didn't. If another slightly less mad genious had not integrated ints and longs I would never have believed it possible without another rewrite. Someone needs to do an analysis of what it takes to merge types and classes and *demonstrate* that it requires a rewrite rather than *asserting* that it requires a rewrite. In other words, let's stop sprinkling the "major rewrite" pixie dust on hard problems and instead decide where we want to go and how we want to get there!
I don't know what you mean by a "low-level strategy" for classes. It's not like we can somehow use C++ vtables without giving up Python's dynamicity. Python's class handling is about as low-level as it can get in my humble opinion.
Of course that's where we want to go but it doesn't avoid the backwards compatibility problems. We can do this today using proxying.
We can easily do this today.
Dito for numbers (and division strategies).
There's no way I'm going to let you get away with that little sleight of hand. How is inheritance holy water going to allow us to change the semantics of: 5/2 without breaking code?? The same goes for a.b = 5 versus a.B = 5 C++ does not help. Inheritance does not help. Pluggable compilers do not help. We *will* break code. We just need to decide whether to do it in a thoughtful way that gives people a chance to migrate or put off decisions until the last possible moment and then spring it on them with no between-version upgrade path.
When you change defaults you break code. Keeping the old functionality around is barely helpful. Nobody has EVER proposed a change to Python where something that was previously possible is now made impossible. So whatever our strategy, the PROBLEM is changing defaults. The solution is telling people what defaults are changing in what timeline and discouraging them from depending on the old defaults.
Sounds mythical. I'm trying to take it out of the realm of fluffy clouds and bring it into the world that people can plan their businesses and coding strategies around. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

"PP" == Paul Prescod <paul@prescod.net> writes:
PP> * implementation language and Python inheritance semantics PP> are almost completely distinct. After all, we have Python PP> implementations in a variety of languages with minor PP> incompatibilities. Python could have a proper type/class PP> merging even if it were written in assembly language. So PP> let's leave implementation language aside. Jython experience backs this up. It would be incredibly convenient if we could just map Java classes to Python classes, so that for example, we'd have in Java a PyException class that is exceptions.Exception with minimal or no Java wrappings. And Finn nearly did this. The problem that we ran into with Java is that it allows only single inheritance. So you couldn't create a Python exception that multiply inherited from two or more other Python exceptions. Doesn't happen often, but it does happen, so is that an acceptable tradeoff? C++ might be better in this particular respect, but there will be other issues, because as soon as you start transparently showing the implementation's classes into Python, you inherit their semantics and restrictions as well. Just saying that it's tricky. -Barry

Paul Prescod wrote:
Hey, think of it as opportunity: we can reuse much of C++'s optimizations and the integration of Python and C++ applications will get *much* easier. A rewrite shouldn't scare anyone away -- much of the existing code can be reused since only the Python C APIs of the various types will have to be rewritten, not the logic behind the types. Besides, Py3K will be a project which runs in parallel to the 2.x development (at least that's what I've read on some BeOpen webpage), so there's really not much to worry about wrt to breakage, etc. People will be able to test-drive Py3K while using the 2.x series.
Right. I didn't want to say that things cannot be done prior to the rewrite, only that a rewrite will give us much more options that we currently have.
See above.
With "low-level" I meant trying to build Python classes and instances on top of a very thin layer on top of C++ classes, e.g. all slots could be implemented using true C++ methods with additional logic to override them using dynamically bound Python method objects.
Huh ? I was talking about clear design... not ways to avoid b/w compatibility. Merging Unicode and strings will hurt one way or another. This is simply a consequence of using strings as binary containers where Unicode is meant for text only use.
We can do this today using proxying.
Could you explain this ?
No we can't: Python's use of pointer compares to find out which type it is dealing with prevent this.
Just tell Python to use the correct class for what the code was written for (and this could be done by plugging in a 2.0 compiler). The instances of those classes would still work together with other semantics by virtue of exposing the same interface, yet the internals would work differently, e.g. a module using case insensitive lookup would be compiled using case insensitive dictionaries as namespaces.
All true. I was just referring to the possibility of keeping the old semantics around in case some module relies on them. In this ideal world, a simple "define pythonlevel=2.0" would suffice to make the old module work with Py3k.
Hmm, wasn't Py3k meant as sandbox for new experiments ? The 2.x series is for doing business with, IMHO at least. At the current dev speed we'll have plenty of time to get Py3k rock solid. Then we can add all the backward compatibility layers we want to it. If the design is a good, adding these layers won't pose much of a problem. Why spoil all the fun when we haven't even started thinking about all the possibilities we could use to make Py3k a success ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
We could make integrating C++ and Python easier through CXX or Py_Cpp. Perhaps we should ship one of those with Python 2.1. Anyhow, integrating C++ and Python is not really so hard, in my experience.
I wasn't really addressing the issue of backwards compatibility of extensions -- I was talking about Python programs. Nevertheless, I can't resist: Porting your app to the Python APIs is often the majority of the work in a particular extension. A lot of Python extensions consist only of API glue!
I don't see how that helps. If you can't write programs that work both on the old interpreter and the new one then you need to have a "switch over" day. The whole point of my doctrine is that Python 3K should run all code that the version of Python immediately before it did. The most it can do in terms of breakage is to warn about error messages. In that case it doesn't matter much whether Python 3K is available at the same time as 2.X or emerges from Guido's head fully formed.
Okay, so do you agree with the rule expressed here: http://www.python.org/pipermail/python-dev/2000-October/016785.html
I don't think that there is really interesting magic in a C++ compiler. A vtable is an array of pointers to functions. We've already implemented the equivalent in ANSI C. C++ exceptions, constructors, destriuctors and smart pointers are a little more interesting from a code maintenance and simplicity point of view. But even then I think that we could get to a C++ implementation through incremental rewrites.
But the whole point of my original article was backwards compatibility!!! I didn't address an implementation strategy for Py3K at all.
The question I wanted to address is how we can *minimize the pain*.
A mode switch solutiion is fraught with dangers. First there is the issue of the correct default for the mode switch. http://www.python.org/pipermail/python-dev/2000-April/010029.html Second, there are dangers cutting and pasting between modules. Anyhow, even if we allow a mode switch we need to be able to help people to upgrade their modules in a reasonable time. That's what the message I cited advocates.
Asking for a map of where we are going and how we will get here is "spoiling all of the fun?" I'm not sure what you are reacting to -- I didn't advise one implementation strategy or another -- I just said that we should employ a strategy that minimizes sudden and surprising code breakage. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Paul Prescod wrote:
Ok, you can use SWIG to get most of the work done, but the interface between the C++ object world and the Python object world is one big mess -- all implementations I've seen use shadow objects or proxies to interface from one to the other... with lots of temporary objects used for the linkup. Having Python itself written in C++ we could do *much* better. But that's only a nice side-effect. The real argument here is that we can push the type logic one layer further down. Ideal would be a level of integration such as the one implemented in JPython or Jython.
True, but we're lucky, since we could provide a compatibility layer on top of the new API. BTW, I think I now know what your main concern is: the Python level compatibility. I was talking of what goes on under the hood and still think that Py3K should be used as a chance to simplify the Python backend. As simplification often means generalization, we'll open up new doors for future developments along the way.
Naa, the whole type slot interface is one big mess (sorry, Guido :-). some slots are packaged, some are not, some are NULL, some are not, there are oodles of sometimes weird dependencies between the slots which are not really documented, etc. etc. The slot design has serious drawbacks and should be replaced by something more reliable, preferably C++ methods. That way, we'll get some more "type" safety into Python and its extensions. Note that porting old extensions won't be all that hard: a class reusing the existing slot functions as methods should suffice in many cases.
Ok, we've been talking about different things here: with "spoiling the fun" I meant putting ropes on possible changes to the C backend. You are talking about the frontend and I agree with you that there should be a clear upgrade path from the 2.x series to Py3K w/r to the Python side of things. So I guess it's time for some PEPs now... the upgrade path PEP and the fluffy clouds PEP. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
...
Having Python itself written in C++ we could do *much* better.
Agree.
I agree that the slot stuff is broken but my solution would be to junk it and use the same mechanism for looking up "type methods" and "instance methods". I can think of two ways to make that perform reasonably: one is method caching and the other is by building interface objects where methods are invoked by index -- basically vtables. But if the same mechanism is going to accelerate Python and C types alike then it can't really use C++ vtables because how do you generate a vtable at runtime for a new Python class? (you could also think of it as a COM interface object)
So I guess it's time for some PEPs now... the upgrade path PEP and the fluffy clouds PEP.
Good timing. I just finished the first draft of the upgrade path PEP. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

"PP" == Paul Prescod <paul@prescod.net> writes:
PP> I agree that the slot stuff is broken but my solution would be PP> to junk it and use the same mechanism for looking up "type PP> methods" and "instance methods". I can think of two ways to PP> make that perform reasonably: one is method caching and the PP> other is by building interface objects where methods are PP> invoked by index -- basically vtables. But if the same PP> mechanism is going to accelerate Python and C types alike then PP> it can't really use C++ vtables because how do you generate a PP> vtable at runtime for a new Python class? (you could also PP> think of it as a COM interface object) Objective-C! :)

[Paul]
... or a number of others (but SWIG falls very short when it comes to things like C++ references)... [Marc-Andre]
While I agree it's messy, those are not the objectionable qualities. Any refcounted C++ system is going to have proxies (smart pointers / refcounting stack-safe "reference" objects to the real heap-based objects). And while I think we'd be better off with a C++ implementation, I would warn that C++'s notion of inheritence is in conflict with Python's. It won't be "as above, so below" (unless we screw interpreting and go straight to native code). Assuming that the class / type dichotomy actually gets healed, that is.
Nope. If you heal the class / type split, there's really only one underlying type object. Unless you go the other way, and write "native" code (as JPython does). All of the existing C++ interfaces / helpers deal only with type objects (at least, those that I've examined, which is most of them). In fact, only ExtensionClass attempts to deal with the class / type split, and while it's a masterpiece, I think it's a great example of everything to avoid in Py3K. - Gordon

Integrating C++ and Python well is hard in a general library. CXX tries to make objects that look and feel like Python objects. But to do that we have to figure out the details of how python uses objects. You have no docs on the subject so we read the source code. Barry CXX maintainer

[MAL]
Not just to be my usual self <wink>, but I do see a from-scratch rewrite as being less likely as the years go by. There's nothing I know of in Guido's plans that can't be done incrementally instead -- and if he doesn't either, selling a total- rewrite project to an employer is probably impossible. The most successful projects I've seen and been on *did* rewrite all the code routinely, but one subsystem at a time. This happens when you're tempted to add a hack, realize it wouldn't be needed if an entire area were reworked, and mgmt is bright enough to realize that hacks compound in fatal ways over time. The "ain't broke, don't fix" philosophy is a good guide here, provided you've got a very low threshold for insisting "it's broke" <0.4 wink>. if-you-would-have-liked-to-do-the-whole-differently-then-by-all-means- *do*-the-whole-thing-differently-that-works-in-c-too-ly y'rs - tim

Tim Peters wrote:
As I mentioned in the posting, the idea was from the "fluffy clouds" area. The rewrite would only involve the core type system and perhaps the core interpreter layout (parser, compiler, etc. all wrapped in classes) -- most of the existing code would be reusable. The idea behind this is somewhat like what you do when starting out a project based on a few simple functions and then reorganizing the code into a class-based approach. There's no need to rewrite all the type internals, just the type interfaces. Python has long reached a level of complexity that is counter- productive when it comes to adding new extension types. Just think of all the problems people have with coercion, the integration of user defined and internal types, the missing links between types and classes, etc. etc. BTW, why all the talk about "employers" ? Much of Python's code base was written without any employer knowing about it ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Absolutely. Whenever possible, we should try to plan for migration in Python 2.x.
It would also help if we could produce automatic translation tools that will convert the old syntax into the new. This desire may restrict our choices however: the translation tools don't have runtime information to go by. It's easy enough to change obsolete syntax into new syntax, but it's hard to decide whether a particular "/" operator should be changed into an integer divide ("//") or left alone.
I think the proper approach is to start a separate migration process for each of these proposed changes. Each should be paced separately (the pace may depend on how hard to swallow the change is for users as well as how hard it is to implement the new functionality) and for each, a separate PEP in the 3000 series should be started. I can even see that several PEPs will be needed in some cases (e.g. one to describe the new syntax, one to to flesh out the implementation, and one to decide on the migration steps). I won't comment on Paul's examples, that's for the various PEP processes. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I agree. As a more concrete extension to my last email, I propose the following doctrine: """ No major documented feature should be removed or have changed semantics in Python 3000 or any other new version of Python until users have had a year (preferably MORE!) of upgrade time. Upgrade time entails the following parts: 1. the released Python version has a new recommended way to accomplish the task in a manner that will remain available in the "breakage version" e.g. a div() function that people can use for a few years while the semantics of "/" are in transition. 2. the mechanism/syntax that will be removed is formally deprecated. The documentation would say, e.g. "You should not use '/' for now. It is changing semantics in the future." 3. the released Python version sports a runtime warning to tell users that the mechanism/syntax is going away. "CompatibilityError: Future versions of Python will have different semantics for the '/' operator. Please use div() instead." The actual "right" amount of upgrade time depends on the extent of the breakage and its ease of detection. """ I can PEP this if people agree. I think that the user community would appreciate our effort to promise not to break code suddenly and capriciously. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Go for it. I have little bandwidth to think about this deeply, but what you're proposing here sounds like a good approach. Certainly it will make it easier if I can point to this PEP when I get the next FUD email about "should I bother to learn Python 2.0 when Py3K is going to be all different?"... --Guido van Rossum (home page: http://www.python.org/~guido/)

Hi, Paul Prescod wrote on python-dev@python.org: [...]
Reclaiming the division operator from truncating integer division: [...]
Bruce Sherwood and David Scherer suggested a IMHO very clever solution to this particular problem in March/April this year. This thread was first posted to idle-dev and afterwards X-posted to python-dev: http://www.python.org/pipermail/idle-dev/2000-April/000138.html http://www.python.org/pipermail/idle-dev/2000-April/000140.html http://www.python.org/pipermail/python-dev/2000-April/010029.html Unfortunately I'm no native english speaker and have not enough time to make a Python-2.1 PEP from it. May be somebody else wants to go for it? I still believe, this was a very clever Python enhancement, which should be officially proposed. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)

Peter Funk wrote:
Tim's last message on it doesn't sound overwhelmingly positiive to me! Also, don't forget that Python is also a command line so "version pragmas" cause additional complexity there. -- Paul Prescod Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Paul Prescod wrote:
I think will simply be a consequence of doing a complete rewrite of the interpreter for Py3K. AFAIR, the only truely feasable solution would be doing the rewrite in a widely portable subset of C++ and then managing classes at that level. Moving to a common and very low-level strategy for classes will allows us to put even the most basic types (strings and numbers) into an inheritence tree. Differences like the ones between Unicode and 8-bit strings would then flesh out as two different subclasses of a generic string type which again is based on a generic sequence type. The same could be done for dictionaries: special ones for just string keys, case insensitive lookups, etc. could all be subclasses of a generic mapping class. Dito for numbers (and division strategies). By following this principle there won't be all that much breakage, since the old functionality will still be around, only the defaults will have changed. Add to this pluggable compilers and ceval loops, plus a nice way of configuring the lot on a per-module basis and you're set. (Ok, it's a fluffy clouds image, but you get the picture ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Marc-Andre Lemburg replied:
Good job in channeling me, Marc-Andre! I'm sure that's not exactly how it's going to be, but on the face of it, this sure sounds like a reasonable possible route. Do you want to be the author for PEP-3000? --Guido van Rossum (home page: http://www.python.org/~guido/)

"M.-A. Lemburg" wrote:
I disagree for a variety of reasons: * implementation language and Python inheritance semantics are almost completely distinct. After all, we have Python implementations in a variety of languages with minor incompatibilities. Python could have a proper type/class merging even if it were written in assembly language. So let's leave implementation language aside. * there is a hell of a lot we can do to make the type/class split less visible without a major rewrite. For instance, lists could have a __getitem__ method so that they "act like" instances. Instances could return their methods in a dir() so that they "act like" built-in objects. So there is no reason to wait for a complete rewrite to start on this path. * It may even be the case that we can get from here to complete merged type/class semantics WITHOUT a rewrite. If a mad genious had not written stackless Python I would have sworn up and down that stackless Python would require a Python rewrite. It didn't. If another slightly less mad genious had not integrated ints and longs I would never have believed it possible without another rewrite. Someone needs to do an analysis of what it takes to merge types and classes and *demonstrate* that it requires a rewrite rather than *asserting* that it requires a rewrite. In other words, let's stop sprinkling the "major rewrite" pixie dust on hard problems and instead decide where we want to go and how we want to get there!
I don't know what you mean by a "low-level strategy" for classes. It's not like we can somehow use C++ vtables without giving up Python's dynamicity. Python's class handling is about as low-level as it can get in my humble opinion.
Of course that's where we want to go but it doesn't avoid the backwards compatibility problems. We can do this today using proxying.
We can easily do this today.
Dito for numbers (and division strategies).
There's no way I'm going to let you get away with that little sleight of hand. How is inheritance holy water going to allow us to change the semantics of: 5/2 without breaking code?? The same goes for a.b = 5 versus a.B = 5 C++ does not help. Inheritance does not help. Pluggable compilers do not help. We *will* break code. We just need to decide whether to do it in a thoughtful way that gives people a chance to migrate or put off decisions until the last possible moment and then spring it on them with no between-version upgrade path.
When you change defaults you break code. Keeping the old functionality around is barely helpful. Nobody has EVER proposed a change to Python where something that was previously possible is now made impossible. So whatever our strategy, the PROBLEM is changing defaults. The solution is telling people what defaults are changing in what timeline and discouraging them from depending on the old defaults.
Sounds mythical. I'm trying to take it out of the realm of fluffy clouds and bring it into the world that people can plan their businesses and coding strategies around. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

"PP" == Paul Prescod <paul@prescod.net> writes:
PP> * implementation language and Python inheritance semantics PP> are almost completely distinct. After all, we have Python PP> implementations in a variety of languages with minor PP> incompatibilities. Python could have a proper type/class PP> merging even if it were written in assembly language. So PP> let's leave implementation language aside. Jython experience backs this up. It would be incredibly convenient if we could just map Java classes to Python classes, so that for example, we'd have in Java a PyException class that is exceptions.Exception with minimal or no Java wrappings. And Finn nearly did this. The problem that we ran into with Java is that it allows only single inheritance. So you couldn't create a Python exception that multiply inherited from two or more other Python exceptions. Doesn't happen often, but it does happen, so is that an acceptable tradeoff? C++ might be better in this particular respect, but there will be other issues, because as soon as you start transparently showing the implementation's classes into Python, you inherit their semantics and restrictions as well. Just saying that it's tricky. -Barry

Paul Prescod wrote:
Hey, think of it as opportunity: we can reuse much of C++'s optimizations and the integration of Python and C++ applications will get *much* easier. A rewrite shouldn't scare anyone away -- much of the existing code can be reused since only the Python C APIs of the various types will have to be rewritten, not the logic behind the types. Besides, Py3K will be a project which runs in parallel to the 2.x development (at least that's what I've read on some BeOpen webpage), so there's really not much to worry about wrt to breakage, etc. People will be able to test-drive Py3K while using the 2.x series.
Right. I didn't want to say that things cannot be done prior to the rewrite, only that a rewrite will give us much more options that we currently have.
See above.
With "low-level" I meant trying to build Python classes and instances on top of a very thin layer on top of C++ classes, e.g. all slots could be implemented using true C++ methods with additional logic to override them using dynamically bound Python method objects.
Huh ? I was talking about clear design... not ways to avoid b/w compatibility. Merging Unicode and strings will hurt one way or another. This is simply a consequence of using strings as binary containers where Unicode is meant for text only use.
We can do this today using proxying.
Could you explain this ?
No we can't: Python's use of pointer compares to find out which type it is dealing with prevent this.
Just tell Python to use the correct class for what the code was written for (and this could be done by plugging in a 2.0 compiler). The instances of those classes would still work together with other semantics by virtue of exposing the same interface, yet the internals would work differently, e.g. a module using case insensitive lookup would be compiled using case insensitive dictionaries as namespaces.
All true. I was just referring to the possibility of keeping the old semantics around in case some module relies on them. In this ideal world, a simple "define pythonlevel=2.0" would suffice to make the old module work with Py3k.
Hmm, wasn't Py3k meant as sandbox for new experiments ? The 2.x series is for doing business with, IMHO at least. At the current dev speed we'll have plenty of time to get Py3k rock solid. Then we can add all the backward compatibility layers we want to it. If the design is a good, adding these layers won't pose much of a problem. Why spoil all the fun when we haven't even started thinking about all the possibilities we could use to make Py3k a success ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
We could make integrating C++ and Python easier through CXX or Py_Cpp. Perhaps we should ship one of those with Python 2.1. Anyhow, integrating C++ and Python is not really so hard, in my experience.
I wasn't really addressing the issue of backwards compatibility of extensions -- I was talking about Python programs. Nevertheless, I can't resist: Porting your app to the Python APIs is often the majority of the work in a particular extension. A lot of Python extensions consist only of API glue!
I don't see how that helps. If you can't write programs that work both on the old interpreter and the new one then you need to have a "switch over" day. The whole point of my doctrine is that Python 3K should run all code that the version of Python immediately before it did. The most it can do in terms of breakage is to warn about error messages. In that case it doesn't matter much whether Python 3K is available at the same time as 2.X or emerges from Guido's head fully formed.
Okay, so do you agree with the rule expressed here: http://www.python.org/pipermail/python-dev/2000-October/016785.html
I don't think that there is really interesting magic in a C++ compiler. A vtable is an array of pointers to functions. We've already implemented the equivalent in ANSI C. C++ exceptions, constructors, destriuctors and smart pointers are a little more interesting from a code maintenance and simplicity point of view. But even then I think that we could get to a C++ implementation through incremental rewrites.
But the whole point of my original article was backwards compatibility!!! I didn't address an implementation strategy for Py3K at all.
The question I wanted to address is how we can *minimize the pain*.
A mode switch solutiion is fraught with dangers. First there is the issue of the correct default for the mode switch. http://www.python.org/pipermail/python-dev/2000-April/010029.html Second, there are dangers cutting and pasting between modules. Anyhow, even if we allow a mode switch we need to be able to help people to upgrade their modules in a reasonable time. That's what the message I cited advocates.
Asking for a map of where we are going and how we will get here is "spoiling all of the fun?" I'm not sure what you are reacting to -- I didn't advise one implementation strategy or another -- I just said that we should employ a strategy that minimizes sudden and surprising code breakage. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Paul Prescod wrote:
Ok, you can use SWIG to get most of the work done, but the interface between the C++ object world and the Python object world is one big mess -- all implementations I've seen use shadow objects or proxies to interface from one to the other... with lots of temporary objects used for the linkup. Having Python itself written in C++ we could do *much* better. But that's only a nice side-effect. The real argument here is that we can push the type logic one layer further down. Ideal would be a level of integration such as the one implemented in JPython or Jython.
True, but we're lucky, since we could provide a compatibility layer on top of the new API. BTW, I think I now know what your main concern is: the Python level compatibility. I was talking of what goes on under the hood and still think that Py3K should be used as a chance to simplify the Python backend. As simplification often means generalization, we'll open up new doors for future developments along the way.
Naa, the whole type slot interface is one big mess (sorry, Guido :-). some slots are packaged, some are not, some are NULL, some are not, there are oodles of sometimes weird dependencies between the slots which are not really documented, etc. etc. The slot design has serious drawbacks and should be replaced by something more reliable, preferably C++ methods. That way, we'll get some more "type" safety into Python and its extensions. Note that porting old extensions won't be all that hard: a class reusing the existing slot functions as methods should suffice in many cases.
Ok, we've been talking about different things here: with "spoiling the fun" I meant putting ropes on possible changes to the C backend. You are talking about the frontend and I agree with you that there should be a clear upgrade path from the 2.x series to Py3K w/r to the Python side of things. So I guess it's time for some PEPs now... the upgrade path PEP and the fluffy clouds PEP. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
...
Having Python itself written in C++ we could do *much* better.
Agree.
I agree that the slot stuff is broken but my solution would be to junk it and use the same mechanism for looking up "type methods" and "instance methods". I can think of two ways to make that perform reasonably: one is method caching and the other is by building interface objects where methods are invoked by index -- basically vtables. But if the same mechanism is going to accelerate Python and C types alike then it can't really use C++ vtables because how do you generate a vtable at runtime for a new Python class? (you could also think of it as a COM interface object)
So I guess it's time for some PEPs now... the upgrade path PEP and the fluffy clouds PEP.
Good timing. I just finished the first draft of the upgrade path PEP. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

"PP" == Paul Prescod <paul@prescod.net> writes:
PP> I agree that the slot stuff is broken but my solution would be PP> to junk it and use the same mechanism for looking up "type PP> methods" and "instance methods". I can think of two ways to PP> make that perform reasonably: one is method caching and the PP> other is by building interface objects where methods are PP> invoked by index -- basically vtables. But if the same PP> mechanism is going to accelerate Python and C types alike then PP> it can't really use C++ vtables because how do you generate a PP> vtable at runtime for a new Python class? (you could also PP> think of it as a COM interface object) Objective-C! :)

[Paul]
... or a number of others (but SWIG falls very short when it comes to things like C++ references)... [Marc-Andre]
While I agree it's messy, those are not the objectionable qualities. Any refcounted C++ system is going to have proxies (smart pointers / refcounting stack-safe "reference" objects to the real heap-based objects). And while I think we'd be better off with a C++ implementation, I would warn that C++'s notion of inheritence is in conflict with Python's. It won't be "as above, so below" (unless we screw interpreting and go straight to native code). Assuming that the class / type dichotomy actually gets healed, that is.
Nope. If you heal the class / type split, there's really only one underlying type object. Unless you go the other way, and write "native" code (as JPython does). All of the existing C++ interfaces / helpers deal only with type objects (at least, those that I've examined, which is most of them). In fact, only ExtensionClass attempts to deal with the class / type split, and while it's a masterpiece, I think it's a great example of everything to avoid in Py3K. - Gordon

Integrating C++ and Python well is hard in a general library. CXX tries to make objects that look and feel like Python objects. But to do that we have to figure out the details of how python uses objects. You have no docs on the subject so we read the source code. Barry CXX maintainer

[MAL]
Not just to be my usual self <wink>, but I do see a from-scratch rewrite as being less likely as the years go by. There's nothing I know of in Guido's plans that can't be done incrementally instead -- and if he doesn't either, selling a total- rewrite project to an employer is probably impossible. The most successful projects I've seen and been on *did* rewrite all the code routinely, but one subsystem at a time. This happens when you're tempted to add a hack, realize it wouldn't be needed if an entire area were reworked, and mgmt is bright enough to realize that hacks compound in fatal ways over time. The "ain't broke, don't fix" philosophy is a good guide here, provided you've got a very low threshold for insisting "it's broke" <0.4 wink>. if-you-would-have-liked-to-do-the-whole-differently-then-by-all-means- *do*-the-whole-thing-differently-that-works-in-c-too-ly y'rs - tim

Tim Peters wrote:
As I mentioned in the posting, the idea was from the "fluffy clouds" area. The rewrite would only involve the core type system and perhaps the core interpreter layout (parser, compiler, etc. all wrapped in classes) -- most of the existing code would be reusable. The idea behind this is somewhat like what you do when starting out a project based on a few simple functions and then reorganizing the code into a class-based approach. There's no need to rewrite all the type internals, just the type interfaces. Python has long reached a level of complexity that is counter- productive when it comes to adding new extension types. Just think of all the problems people have with coercion, the integration of user defined and internal types, the missing links between types and classes, etc. etc. BTW, why all the talk about "employers" ? Much of Python's code base was written without any employer knowing about it ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Absolutely. Whenever possible, we should try to plan for migration in Python 2.x.
It would also help if we could produce automatic translation tools that will convert the old syntax into the new. This desire may restrict our choices however: the translation tools don't have runtime information to go by. It's easy enough to change obsolete syntax into new syntax, but it's hard to decide whether a particular "/" operator should be changed into an integer divide ("//") or left alone.
I think the proper approach is to start a separate migration process for each of these proposed changes. Each should be paced separately (the pace may depend on how hard to swallow the change is for users as well as how hard it is to implement the new functionality) and for each, a separate PEP in the 3000 series should be started. I can even see that several PEPs will be needed in some cases (e.g. one to describe the new syntax, one to to flesh out the implementation, and one to decide on the migration steps). I won't comment on Paul's examples, that's for the various PEP processes. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I agree. As a more concrete extension to my last email, I propose the following doctrine: """ No major documented feature should be removed or have changed semantics in Python 3000 or any other new version of Python until users have had a year (preferably MORE!) of upgrade time. Upgrade time entails the following parts: 1. the released Python version has a new recommended way to accomplish the task in a manner that will remain available in the "breakage version" e.g. a div() function that people can use for a few years while the semantics of "/" are in transition. 2. the mechanism/syntax that will be removed is formally deprecated. The documentation would say, e.g. "You should not use '/' for now. It is changing semantics in the future." 3. the released Python version sports a runtime warning to tell users that the mechanism/syntax is going away. "CompatibilityError: Future versions of Python will have different semantics for the '/' operator. Please use div() instead." The actual "right" amount of upgrade time depends on the extent of the breakage and its ease of detection. """ I can PEP this if people agree. I think that the user community would appreciate our effort to promise not to break code suddenly and capriciously. -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Go for it. I have little bandwidth to think about this deeply, but what you're proposing here sounds like a good approach. Certainly it will make it easier if I can point to this PEP when I get the next FUD email about "should I bother to learn Python 2.0 when Py3K is going to be all different?"... --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (9)
-
Barry Scott
-
barry@wooz.org
-
Gordon McMillan
-
Greg Wilson
-
Guido van Rossum
-
M.-A. Lemburg
-
Paul Prescod
-
pf@artcom-gmbh.de
-
Tim Peters