[Python-Dev] What should the focus for 2.6 be?
Talin
talin at acm.org
Mon Aug 21 08:20:34 CEST 2006
Guido van Rossum wrote:
> I've been thinking a bit about a focus for the 2.6 release.
>
> We are now officially starting parallel development of 2.6 and 3.0. I
> really don't expect that we'll be able to merge the easily into the
> 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5.
>
> I wonder if it would make sense to focus in 2.6 on making porting of
> 2.6 code to 3.0 easier, rather than trying to introduce new features
> in 2.6. We've done releases without new language features before;
> notable 2.3 didn't add anything new (except making a few __future__
> imports redundant) and concentrated on bugfixes, performance, and
> library additions.
I've been thinking about the transition to unicode strings, and I want
to put forward a notion that might allow the transition to be done
gradually instead of all at once.
The idea would be to temporarily introduce a new name for 8-bit strings
- let's call it "ascii". An "ascii" object would be exactly the same as
today's 8-bit strings.
The 'str' builtin symbol would be assigned to 'ascii' by default, but
you could assign it to 'unicode' if you wanted to default to wide strings:
str = ascii # Selects 8-bit strings by default
str = unicode # Selects unicode strings by default
In order to make the transition, what you would do is to temporarily
undefine the 'str' symbol from the code base - in other words, remove
'str' from the builtin namespace, and then migrate all of the code --
replacing any library reference to 'str' with a reference to 'ascii'
*or* updating that function to deal with unicode strings. Once you get
all of the unit tests running again, you can re-introduce 'str', but now
you know that since none of the libraries refer to 'str' directly, you
can safely change its definition.
All of this could be done while retaining compatibility with existing
3rd party code - as long as 'str = ascii' is defined. So you turn it on
to run your Python programs, and turn it off when you want to work on
3.0 migration.
The next step (which would not be backwards compatible) would be to
gradually remove 'ascii' from the code base -- wherever that name
occurs, it would be a signal that the function needs to be updated to
use 'unicode' instead.
Finally, once the last occurance of 'ascii' is removed, the final step
is to do a search and replace of all occurances of 'unicode' with 'str'.
I know this seems round-about, and is more work than doing it all in one
shot. However, I know from past experience that the trickiest part of
doing a pervasive change to a code base like this is just keeping track
of what parts have been migrated and what parts have not. Many times in
the past I've changed the definition of a ubiquitous type by temporarily
renaming it, thus vacating the old name so that it can be defined anew,
without conflict.
-- Talin
More information about the Python-Dev
mailing list