[Python-Dev] PEP 414 - Unicode Literals for Python 3

Vinay Sajip vinay_sajip at yahoo.co.uk
Tue Feb 28 17:08:06 CET 2012


Ezio Melotti <ezio.melotti <at> gmail.com> writes:

 
> For every CPython bug that I fix I first apply the patch on 2.7, then on 
> 3.2 and then on 3.3.
> Most of the time I don't even need to change anything while applying the 
> patch to 3.2, sometimes I have to do some trivial fixes.  This is also 
> true for another personal 12kloc project* where I'm using the 
> two-branches approach.

I hear what you say about the personal project, but IMO CPython is atypical (as
far as this discussion is concerned), not least because it's not a pure-Python
project.

> For me, the costs of having two branches are:
>   1) a one-time conversion when the Python3-compatible branch is created 
> (can be done easily with 2to3);

Yes, but the amount of ease is project-dependent. For example, 2to3 wraps
values() method calls with list(), which is a reasonable thing to do for dicts;
when presented Django's querysets, which have a values() method which should not
be wrapped, then you have to go through and sort things out. I'm not knocking
2to3, which I think is great. Just that things go well sometimes, and less well
at other times,

> With the shared code base approach, the costs are:
>   1) a one-time conversion to "fix" the code base and make it run on 
> both 2.x and 3.x;
>   2) keep using and having to deal with hacks in order to keep it running.

Which hacks do you mean, if you're only interested in 2.6+?
 
> With the first approach, you also have two clean and separate code 
> bases, with no hacks; when you stop using Python 2, you end up with a 
> clean Python 3 branch.
> The one-time conversion also seems easier in the first case.
> 
> (Note: there are also other costs -- e.g. releasing -- that I haven't 
> considered because they don't affect me personally, but I'm not sure 
> they are big enough to make the two-branches approach worse.)

I don't believe there's a one-size-fits-all. The two branches approach is
appealing, and I have no quarrel with it: but I contend that big projects like
Django would be reluctant to switch, or take much longer to switch to 3.x, if
they had to maintain separate branches. Given the size of their user community,
they have to follow strict release procedures, which (even with just running on
2.x) smaller projects can be more relaxed about.

You forgot to mention the part which is most time-consuming day-to-day: making
changes and testing. For the two-branch approach, its

1. Change on 2.x
2. Test on 2.x
3. Commit on 2.x
4. Merge to 3.x
5. Possibly change on 3.x
6. Test on 3.x
7. Commit on 3.x

where each "test" step, if failures occur, might take you back to a previous
"change" step.

For the single codebase, that's

1. Change
2. Test on 2.x
3. Test on 3.x
4. Commit

This, to me, is the single big advantage of the single codebase approach, and
the productivity improvements outweigh code purity issues which are, in the
grand scheme of things, not all that large.

Another advantage is DRY: you don't have to worry about forgetting to merge some
changes from 2.x to 3.x. Haven't we all been there one time or another? I know I
have, though I try not to make a habit of it ;-)

> After the initial conversion of the code base, the fixes are mostly 
> trivial, so people don't need to write two patches (most of the patches 
> we get for CPython are either against 2.7 or 3.2, and sometimes they 
> even apply clearly to both).

Fixes may be trivial, but new features might not always be so.
 
Regards,

Vinay Sajip



More information about the Python-Dev mailing list