[Python-Dev] What should the focus for 2.6 be?

Mon Aug 21 08:20:34 CEST 2006

Guido van Rossum wrote:
> I've been thinking a bit about a focus for the 2.6 release.
> 
> We are now officially starting parallel development of 2.6 and 3.0. I
> really don't expect that we'll be able to merge the easily into the
> 3.0 branch much longer, so effectively 3.0 will be a fork of 2.5.
> 
> I wonder if it would make sense to focus in 2.6 on making porting of
> 2.6 code to 3.0 easier, rather than trying to introduce new features
> in 2.6. We've done releases without new language features before;
> notable 2.3 didn't add anything new (except making a few __future__
> imports redundant) and concentrated on bugfixes, performance, and
> library additions.

I've been thinking about the transition to unicode strings, and I want 
to put forward a notion that might allow the transition to be done 
gradually instead of all at once.

The idea would be to temporarily introduce a new name for 8-bit strings 
- let's call it "ascii". An "ascii" object would be exactly the same as 
today's 8-bit strings.

The 'str' builtin symbol would be assigned to 'ascii' by default, but 
you could assign it to 'unicode' if you wanted to default to wide strings:

    str = ascii   # Selects 8-bit strings by default
    str = unicode # Selects unicode strings by default

In order to make the transition, what you would do is to temporarily 
undefine the 'str' symbol from the code base - in other words, remove 
'str' from the builtin namespace, and then migrate all of the code -- 
replacing any library reference to 'str' with a reference to 'ascii' 
*or* updating that function to deal with unicode strings. Once you get 
all of the unit tests running again, you can re-introduce 'str', but now 
you know that since none of the libraries refer to 'str' directly, you 
can safely change its definition.

All of this could be done while retaining compatibility with existing 
3rd party code - as long as 'str = ascii' is defined. So you turn it on 
to run your Python programs, and turn it off when you want to work on 
3.0 migration.

The next step (which would not be backwards compatible) would be to 
gradually remove 'ascii' from the code base -- wherever that name 
occurs, it would be a signal that the function needs to be updated to 
use 'unicode' instead.

Finally, once the last occurance of 'ascii' is removed, the final step 
is to do a search and replace of all occurances of 'unicode' with 'str'.

I know this seems round-about, and is more work than doing it all in one 
shot. However, I know from past experience that the trickiest part of 
doing a pervasive change to a code base like this is just keeping track 
of what parts have been migrated and what parts have not. Many times in 
the past I've changed the definition of a ubiquitous type by temporarily 
renaming it, thus vacating the old name so that it can be defined anew, 
without conflict.

-- Talin