What should the focus for 2.6 be?
I've been thinking a bit about a focus for the 2.6 release. We are now officially starting parallel development of 2.6 and 3.0. I really don't expect that we'll be able to merge the easily into the 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5. I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions. Some projects that could be undertaken in 2.6: - add warnings when apply() is used - add warnings when string exceptions or non-BaseException-derived exceptions are used (this is already planned in PEP 252, which has a very specific roll-out plan) - add warnings when has_key() is used - add warnings when the result of dict.keys(), .values(), .items() is used for anything else than iterating over it - a warning if a class defines has_key() but not __contains__(). - add __contains__ to dbm and gdbm - add warnings to modules and built-ins that will die in 3.0 Some of these warnings should be suppressed by default, but enabled by a command line option. We should also do the work on the standard library to avoid the warnings: get rid of apply(), use 'x in d' instead of 'd.has_key(x)', etc. I've recently done some of this work in the 3.0 branch (e.g. dbm/gdbm are fresh in my memory). Another area that could use a lot of work (and from which 3.0 could also benefit directly) is converting all unit tests to using either unittest.py or doctest.py. There are still at least 40 tests written in the old "compare the output with one we prepared earlier" style. Of course, if people are chomping at the bit to implement certain new features (and those features are generally approved of) then I don't want to stop them; but I would recommend that our effort may better be focused on smoothing the 2.6/3.0 transition rather than looking for new features to add to 2.6. I am often approached by people who object against this or that feature proposal not because they dislike the proposed feature in particular, but because they believe Python is already large enough, and they worry that if we keep adding features at the current pace, it will soon become too unwieldy, and hence harder to learn or to keep in one's brain. I am very sympathetic to this concern (and you should be too!). This is one of the reasons that so much of the Python 3000 work is about ripping out old stuff and simplifying / unifiying things. Dropping two common data types (long and unicode -- of course they will really be merged into their simpler counterparts int and str, but it means much less to learn/remember) is one example. Ripping out classic classes and string exceptions another. Removing dead library modules a third. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 20, 2006, at 11:24 AM, Guido van Rossum wrote:
I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions.
+1, and there are other benefits to this approach too. First, the pace of change appears to slow, which addresses another source of complaints. Because instead of a slew of new features every 18 months, we really see that slew only every three years, with a stabilizing and bug fixing release in between. Another benefit is that with a de-emphasis on new features, we can spend more time improving the library and documentation. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBROiAk3EjvBPtnXfVAQKeMQP/QBfHJexDveKoj/nfjRjic3+HBvTupkoA bga7GmLV7Rn14AoHf+L6n3IhKkE1sIVXwzUmEoOeVN74h3trJSAeEYPjCF7Vt3// 3fZ4SgAlEy3nsOwRYufUtyYU9r36H7Fn7dKTtj+hJCVAzZdAOERy8ZMAEoSOw+Q4 vNfudLPznDQ= =7D1V -----END PGP SIGNATURE-----
On 8/20/06, Barry Warsaw
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Aug 20, 2006, at 11:24 AM, Guido van Rossum wrote:
I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions.
+1, and there are other benefits to this approach too.
First, the pace of change appears to slow, which addresses another source of complaints. Because instead of a slew of new features every 18 months, we really see that slew only every three years, with a stabilizing and bug fixing release in between. Another benefit is that with a de-emphasis on new features, we can spend more time improving the library and documentation.
I think fixing tests and documentation would be a great thing to focus 2.6on. Not glamourous, I know, but it is needed. For tests, I hope to get some decorators and such written that will help classify tests. Also adding a function to denote what module is being tested would be good (to avoid the issue of a dependent import for testing failing and then everyone just thinking the test was skipped). Lastly, testing the C API using ctypes would be really good since it is not thorougly tested. As for the docs, they just need a thorough updating. As to whether we should come up with some other format for Py3K with better semantic information and that is easier to read is another question entirely. -Brett
On Mon, Aug 21, 2006 at 12:24:54PM -0700, Brett Cannon wrote:
As for the docs, they just need a thorough updating.
Michael Hudson was working on a new guide to extending/embedding Python. Incorporating that should be a goal for 2.6 (the document may still need to be finished -- I'm not sure what state it's in). The HOWTOs should be incorporated into the documentation build process; this will mean you need both LaTeX and Docutils to build all the documentation. I already have a Makefile patch for this; perhaps I should just check it into the trunk now. --amk
"A.M. Kuchling"
On Mon, Aug 21, 2006 at 12:24:54PM -0700, Brett Cannon wrote:
As for the docs, they just need a thorough updating.
Michael Hudson was working on a new guide to extending/embedding Python. Incorporating that should be a goal for 2.6 (the document may still need to be finished -- I'm not sure what state it's in).
Oh my word, even I had nearly forgotten about that. I think the latest version is here: http://starship.python.net/crew/mwh/toext/ and source is here: http://codespeak.net/svn/user/mwh/toext/ It's certainly not even half-finished, and I'm not sure how much time or enthusiasm I'm likely to be able to find for it in the medium term. Cheers, mwh -- ZAPHOD: OK, so ten out of ten for style, but minus several million for good thinking, eh? -- The Hitch-Hikers Guide to the Galaxy, Episode 2
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 22, 2006, at 11:26 AM, Michael Hudson wrote:
Oh my word, even I had nearly forgotten about that.
I think the latest version is here:
http://starship.python.net/crew/mwh/toext/
and source is here:
http://codespeak.net/svn/user/mwh/toext/
It's certainly not even half-finished, and I'm not sure how much time or enthusiasm I'm likely to be able to find for it in the medium term.
I'd be willing to help out with this, as well as flesh out some of the more, er, under-documented parts of the C API. I do a /ton/ of embedding/extending work for my Real Job so I think I could justify some work time devoted to improving things here. I'll try to take a look at MWH's stuff, but should we go ahead and check it into Python's svn at some point? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBROsjuHEjvBPtnXfVAQLEcQQAiyH3LaJysEJrpbHPQhXUMBlQ/+ULDatc D1pBUmN02jY33fg14Z8Tw7AjJ5cOWsDFeF+gh6TBXvawpvF2q/XafdmtjkjwvQpa 8KfAl5JeNHSyTTd5tcKuZ1y2mQ1XpVVIqICT8PP24l+6RyCbnRhTe6MaLUHnUegm bSl728r3s64= =7WIt -----END PGP SIGNATURE-----
Guido van Rossum wrote:
I've been thinking a bit about a focus for the 2.6 release.
We are now officially starting parallel development of 2.6 and 3.0. I really don't expect that we'll be able to merge the easily into the 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5.
I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions.
I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once. The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings. The 'str' builtin symbol would be assigned to 'ascii' by default, but you could assign it to 'unicode' if you wanted to default to wide strings: str = ascii # Selects 8-bit strings by default str = unicode # Selects unicode strings by default In order to make the transition, what you would do is to temporarily undefine the 'str' symbol from the code base - in other words, remove 'str' from the builtin namespace, and then migrate all of the code -- replacing any library reference to 'str' with a reference to 'ascii' *or* updating that function to deal with unicode strings. Once you get all of the unit tests running again, you can re-introduce 'str', but now you know that since none of the libraries refer to 'str' directly, you can safely change its definition. All of this could be done while retaining compatibility with existing 3rd party code - as long as 'str = ascii' is defined. So you turn it on to run your Python programs, and turn it off when you want to work on 3.0 migration. The next step (which would not be backwards compatible) would be to gradually remove 'ascii' from the code base -- wherever that name occurs, it would be a signal that the function needs to be updated to use 'unicode' instead. Finally, once the last occurance of 'ascii' is removed, the final step is to do a search and replace of all occurances of 'unicode' with 'str'. I know this seems round-about, and is more work than doing it all in one shot. However, I know from past experience that the trickiest part of doing a pervasive change to a code base like this is just keeping track of what parts have been migrated and what parts have not. Many times in the past I've changed the definition of a ubiquitous type by temporarily renaming it, thus vacating the old name so that it can be defined anew, without conflict. -- Talin
Talin wrote:
Guido van Rossum wrote:
I've been thinking a bit about a focus for the 2.6 release.
We are now officially starting parallel development of 2.6 and 3.0. I really don't expect that we'll be able to merge the easily into the 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5.
I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions.
I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once.
The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings.
The 'str' builtin symbol would be assigned to 'ascii' by default, but you could assign it to 'unicode' if you wanted to default to wide strings:
str = ascii # Selects 8-bit strings by default str = unicode # Selects unicode strings by default
This doesn't change the type of string literals. Georg
I'll keep this in mind -- with the caveat that Georg mentioned.
For the next 96 hours I'm going to be severely limited in bandwidth
due to the physical requirements of the sprint at Google. I'd
appreciate not receiving too much email during this period...
--Guido
On 8/20/06, Talin
Guido van Rossum wrote:
I've been thinking a bit about a focus for the 2.6 release.
We are now officially starting parallel development of 2.6 and 3.0. I really don't expect that we'll be able to merge the easily into the 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5.
I wonder if it would make sense to focus in 2.6 on making porting of 2.6 code to 3.0 easier, rather than trying to introduce new features in 2.6. We've done releases without new language features before; notable 2.3 didn't add anything new (except making a few __future__ imports redundant) and concentrated on bugfixes, performance, and library additions.
I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once.
The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings.
The 'str' builtin symbol would be assigned to 'ascii' by default, but you could assign it to 'unicode' if you wanted to default to wide strings:
str = ascii # Selects 8-bit strings by default str = unicode # Selects unicode strings by default
In order to make the transition, what you would do is to temporarily undefine the 'str' symbol from the code base - in other words, remove 'str' from the builtin namespace, and then migrate all of the code -- replacing any library reference to 'str' with a reference to 'ascii' *or* updating that function to deal with unicode strings. Once you get all of the unit tests running again, you can re-introduce 'str', but now you know that since none of the libraries refer to 'str' directly, you can safely change its definition.
All of this could be done while retaining compatibility with existing 3rd party code - as long as 'str = ascii' is defined. So you turn it on to run your Python programs, and turn it off when you want to work on 3.0 migration.
The next step (which would not be backwards compatible) would be to gradually remove 'ascii' from the code base -- wherever that name occurs, it would be a signal that the function needs to be updated to use 'unicode' instead.
Finally, once the last occurance of 'ascii' is removed, the final step is to do a search and replace of all occurances of 'unicode' with 'str'.
I know this seems round-about, and is more work than doing it all in one shot. However, I know from past experience that the trickiest part of doing a pervasive change to a code base like this is just keeping track of what parts have been migrated and what parts have not. Many times in the past I've changed the definition of a ubiquitous type by temporarily renaming it, thus vacating the old name so that it can be defined anew, without conflict.
-- Talin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Talin
I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once.
The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings.
There are two parts to the unicode conversion; all literals are unicode, and we don't have strings anymore, we have bytes. Without offering the bytes object, then people can't really convert their code. String literals can be handled with the -U command line option (and perhaps having the interpreter do the str=unicode assignment during startup). In any case, as I look at Py3k and the future of Python, in each release, I ask "what are the compelling features that make me want to upgrade?" In each of the 1.5-2.5 series that I've looked at, each has had some compelling feature or another that has basically required that I upgrade, or seriously consider upgrading (bugfixes for stuff that has bitten me, new syntax that I use, significant increases in speed, etc.) . As we approach Py3k, I again ask, "what are the compelling features?" Wholesale breakage of anything that uses ascii strings as text or binary data? A completely changed IO stack (requiring re-learning of everything known about Python IO)? Dictionary .keys(), .values(), and .items() being their .iter*() equivalents (making it just about impossible to optimize for Py3k dictionary behavior now)? I understand getting rid of the cruft, really I do (you should see some cruft I've been replacing lately). But some of that cruft is useful, or really, some of that cruft has no alternative currently, which will require significant rewrites of user code when Py3k is released. When everyone has to rewrite their code, they are going to ask, "Why don't I just stick with the maintenance 2.x? It's going to be maintained for a few more years yet, and I don't need to rewrite all of my disk IO, strings in dictionary code, etc. I will be right along with them (no offense intended to those currently working towards py3k). I can code defensively against buffer-sturating DOS attacks with my socket code, but I can't code defensively to handle some (never mind all) of the changes and incompatabilities that Py3k will bring. Here's my suggestion: every feature, syntax, etc., that is slated for Py3k, let us release bit by bit in the 2.x series. That lets the 2.x series evolve into the 3.x series in a somewhat more natural way than the currently proposed *everything breaks*. If it takes 1, 2, 3, or 10 more releases in the 2.x series to get to all of the 3.x features, great. At least people will have a chance to convert, or at least write correct code for the future. Say 2.6 gets bytes and special factories (or a special encoding argument) for file/socket to return bytes instead of strings, and only accept bytes objects to .write() methods (unless an encoding on the file, etc., was previously given). Given these bytes objects, it may even make sense to offer the .readinto() method that Alex B has been asking for (which would make 3 built-in objects that could reasonably support readinto: bytes, array, mmap). If the IO library is available for 2.6, toss that in there, or offer it in PyPI as an evolving library. I would suggest pushing off the dict changes until 2.7 or later, as there are 340+ examples of dict.keys() in the Python 2.5b2 standard library, at least half of which are going to need to be changed to list(dict.keys()) or otherwise. The breakage in user code will likely be at least as substantial. Those are just examples that come to mind now, but I'm sure there are others changes with similar issues. - Josiah
On Mon, 21 Aug 2006 14:21:30 -0700, Josiah Carlson
Talin
wrote: [snip] I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once.
The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings.
There are two parts to the unicode conversion; all literals are unicode, and we don't have strings anymore, we have bytes. Without offering the bytes object, then people can't really convert their code. String literals can be handled with the -U command line option (and perhaps having the interpreter do the str=unicode assignment during startup).
A third step would ease this transition significantly: a unicode_literals __future__ import.
Here's my suggestion: every feature, syntax, etc., that is slated for Py3k, let us release bit by bit in the 2.x series. That lets the 2.x series evolve into the 3.x series in a somewhat more natural way than the currently proposed *everything breaks*. If it takes 1, 2, 3, or 10 more releases in the 2.x series to get to all of the 3.x features, great. At least people will have a chance to convert, or at least write correct code for the future.
This really seems like the right idea. "Shoot the moon" upgrades are almost always worse than incremental upgrades. The incremental path is better for everyone involved. For developers of Python, it gets more people using and providing feedback on the new features being developed. For developers with Python, it keeps the scope of a particular upgrade more manageable, letting them developer focus on a much smaller set of changes to be made to their application. Jean-Paul
participants (9)
-
A.M. Kuchling
-
Barry Warsaw
-
Brett Cannon
-
Georg Brandl
-
Guido van Rossum
-
Jean-Paul Calderone
-
Josiah Carlson
-
Michael Hudson
-
Talin