[Python-Dev] What should the focus for 2.6 be?

Mon Aug 21 23:21:30 CEST 2006

Talin <talin at acm.org> wrote:
[snip]
> I've been thinking about the transition to unicode strings, and I want 
> to put forward a notion that might allow the transition to be done 
> gradually instead of all at once.
> 
> The idea would be to temporarily introduce a new name for 8-bit strings 
> - let's call it "ascii". An "ascii" object would be exactly the same as 
> today's 8-bit strings.

There are two parts to the unicode conversion; all literals are unicode,
and we don't have strings anymore, we have bytes.  Without offering the
bytes object, then people can't really convert their code.  String
literals can be handled with the -U command line option (and perhaps
having the interpreter do the str=unicode assignment during startup).

In any case, as I look at Py3k and the future of Python, in each release,
I ask "what are the compelling features that make me want to upgrade?"
In each of the 1.5-2.5 series that I've looked at, each has had some
compelling feature or another that has basically required that I upgrade,
or seriously consider upgrading (bugfixes for stuff that has bitten me,
new syntax that I use, significant increases in speed, etc.) .

As we approach Py3k, I again ask, "what are the compelling features?"
Wholesale breakage of anything that uses ascii strings as text or binary
data? A completely changed IO stack (requiring re-learning of everything
known about Python IO)?  Dictionary .keys(), .values(), and .items()
being their .iter*() equivalents (making it just about impossible to
optimize for Py3k dictionary behavior now)?

I understand getting rid of the cruft, really I do (you should see some
cruft I've been replacing lately). But some of that cruft is useful, or
really, some of that cruft has no alternative currently, which will
require significant rewrites of user code when Py3k is released.  When
everyone has to rewrite their code, they are going to ask, "Why don't I
just stick with the maintenance 2.x? It's going to be maintained for a
few more years yet, and I don't need to rewrite all of my disk IO,
strings in dictionary code, etc.  I will be right along with them (no
offense intended to those currently working towards py3k).

I can code defensively against buffer-sturating DOS attacks with my
socket code, but I can't code defensively to handle some (never mind all)
of the changes and incompatabilities that Py3k will bring.

Here's my suggestion: every feature, syntax, etc., that is slated for
Py3k, let us release bit by bit in the 2.x series.  That lets the 2.x
series evolve into the 3.x series in a somewhat more natural way than
the currently proposed *everything breaks*.  If it takes 1, 2, 3, or 10
more releases in the 2.x series to get to all of the 3.x features, great.
At least people will have a chance to convert, or at least write correct
code for the future.

Say 2.6 gets bytes and special factories (or a special encoding argument)
for file/socket to return bytes instead of strings, and only accept
bytes objects to .write() methods (unless an encoding on the file, etc.,
was previously given). Given these bytes objects, it may even make sense
to offer the .readinto() method that Alex B has been asking for (which
would make 3 built-in objects that could reasonably support readinto:
bytes, array, mmap).

If the IO library is available for 2.6, toss that in there, or offer it
in PyPI as an evolving library.

I would suggest pushing off the dict changes until 2.7 or later, as
there are 340+ examples of dict.keys() in the Python 2.5b2 standard
library, at least half of which are going to need to be changed to
list(dict.keys()) or otherwise.  The breakage in user code will likely
be at least as substantial.

Those are just examples that come to mind now, but I'm sure there are
others changes with similar issues.

 - Josiah