
Author: brett.cannon Date: Sun Jan 25 00:06:31 2009 New Revision: 68914 Log: Thoroughly clean up what impressions I have written so far. Modified: peps/trunk/pep-0374.txt Modified: peps/trunk/pep-0374.txt ============================================================================== --- peps/trunk/pep-0374.txt (original) +++ peps/trunk/pep-0374.txt Sun Jan 25 00:06:31 2009 @@ -1344,120 +1344,136 @@ is likely to shock a veteran CVS user. -Impressions -=========== +Tests/Impressions +================= + As I (Brett Cannon) am left with the task of of making the final decision of which/any DVCS to go with and not my co-authors, I felt -it only fair to write down my impressions as I evaluate the various -tools so as to be as transparent as possible. +it only fair to write down what tests I ran and my impressions as I +evaluate the various tools so as to be as transparent as possible. + + +Barrier to Entry +---------------- -To begin, I measured the checking out of code as if I was a non-core -developer. This is important as this is the first impression -developers have when they decide they wish to contribute a patch to -Python. Timings were done using the ``time`` command in zsh and +The amount of time and effort it takes to get a checkout of Python's +repository is critical. If the difficulty or time is too great then a +person wishing to contribute to Python may very well give up. That +cannot be allowed to happen. + +I measured the checking out of code as if I was a non-core +developer. Timings were done using the ``time`` command in zsh and space was calculated with ``du -c -h``. ======= ================ ============== DVCS Time Space ------- ---------------- -------------- -svn 1:04 139 M -bzr 2:29:24 or 8:46 275 M or 596 M -hg 2:30 171 M -git 2:54 134 M +svn 1:04 139 M +bzr 1 2:29:24 275 M +bzr 2 8:46 596 M +hg 2:30 171 M +git 2:54 134 M ======= ================ ============== -The svn measurements are not exactly a 1:1 comparison to the DVCSs. -For one, svn does not download the entire revision history, and thus -(should) have the least amount to download. And two, because various -calculation steps are left up to the server the entire process of -checking out code (should) be faster. - -But the svn measurements should be considered as what developers are -used to. Thus they act as a reference point for what people tend to -expect in terms of performance. - -Looking at bzr, I have listed two numbers. The first values are for -running ``bzr branch`` as outlined in the `One-Off Checkout`_ -scenario. When the -timings came back in hours (I used Launchpad as code.python.org is -not running the newest version of bzr and I wanted to use its latest -networking protocol), I decided to try using the steps outlined when -the experimental bzr branches were first created. That second -approach is what the second set of values for bzr represent. - -While both the hg and git numbers are perfectly acceptable, the bzr -numbers not necessarily. The raw ``bzr branch`` approach is entirely -not acceptable as no one wants to wait over two hours to write a -potentially one line change to some code for the benefit of Python. -Assuming 8:46 is a reasonable amount of time (I believe it in -general is, but it is teetering on not), the 596 M space requirement -could be an issue for some. While we typically view disk space as -cheap, for some people it might be an issue (e.g. the person who did -the schedule for PyCon 2008 did it over a connection so badly that -Google Spreadsheets didn't work for him and he had to submit the -schedule in another form than the one original used). Once again I -think the space usage is acceptable, but it is close to being too -much. - -To see if bzr's performance would be acceptable once at least the -branch was downloaded, I decided to see how long it would take to -get the change log for a file. I chose the README file as it sees -regular changes for every release and has a revision history going -back to 1993 and thus would have a fair number of revisions. -It should be mentioned that while git had the nicest output thanks to -its color terminal output, it also took a while to find the -``--no-pager`` flag in order to get just a stream of text instead of -having the output sent to the pager. - -Overall the numbers were all acceptable: - -* bzr: 4.5 seconds -* hg: 1.1 seconds -* git: 1.5 seconds - -While having bzr be over 3x slower than its nearest neighbor, it -must be kept in mind that the total performance time is still -acceptable, regardless of the multiplier. - -Because a DVCS keeps its revision history on disk, it also means -that typically they can be zipped up for direct downloading. At -least in bzr's case that would solve the performance issue for -initial checkout if the zip file could be generate constantly. But -that didn't address the cost of pulling in new revisions when a -checkout has gone stale. To measure this I decided I would check out -the repositories back about 700 revisions which represented the -amount of change made since the beginning of the month and time how -long they took to update. - -For this to happen I first had to remember the URLs for the -repositories. Instead of simply looking in this PEP, though, I -decided to try to figure it out from the command-line help for each -tool or simply guessing. Bzr worked out great with ``bzr info``. Git -took a little poking around, but I figured out ``git remote show -origin`` told me what I needed. For hg, though, I couldn't figure it -out short of running ``hg pull`` and denoting the status information -during the pull (turns out ``hg paths`` is what I was looking for). - -With the repository locations known I then had to perform a checkout -to a certain revision. Turns out that git will not clone a -repository to only a specific revision, although from personal -experience git's pull facility is very fast. Bzr was able to perform -its update in just over 39 seconds. Hg did its update in just over -17 seconds. Much like the log test, while the multiplier of slowness -seems high, in real life terms al DVCSs performed within reason. - -In my mind this means that bzr is only an acceptable candidate as -long as an fairly up-to-date archive of Python's key branches are -made available for people to download to avoid bzr's very so remote -branching. +.. note:: + The *bzr 1* entry is for + following the instructions in the `One-Off Checkout`_ scenario + instructions pulling from Launchpad_ in mid-January. + The *bzr 2* entry is based on following the instructions + for the `experimental Bazaar branches + <http://www.python.org/dev/bazaar/>`_ and pulling from + http://code.python.org/python/trunk/. + +When comparing these numbers to svn, it is important to realize that +it is not a 1:1 comparison. Svn does not pull down the entire revision +history like all of the DVCSs do. That means svn can perform an +initial checkout much faster than the DVCS purely based on the fact +that it has less information to worry about. + + +Performance of basic information functionality +---------------------------------------------- + +To see how the tools did for performing a command that required +querying the history, the log for the ``README`` file was timed. + +==== ===== +DVCS Time +---- ----- +bzr 4.5 s +hg 1.1 s +git 1.5 s +==== ===== + +One thing of note during this test was that git took longer than the +other three tools to figure out how to get the log without it using a +pager. While the pager use is a nice touch in general, not having it +automatically turn on took some time (turns out the main ``git`` +command has a ``--no-pager`` flag to disable use of the pager). + + +Figuring out what command to use from built-in help +---------------------------------------------------- + +I ended up trying to find out what the command was to see what URL the +repository was cloned from. To do this I used nothing more than the +help provided by the tool itself or its man pages. + +Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show +what I wanted, but mentioned ``bzr help commands``. That list had the +command with a description that made sense. + +Git was the second easiest. The command ``git help`` didn't show much +and did not have a way of listing all commands. That is when I viewed +the man page. Reading through the various commands I discovered ``git +remote``. The command itself spit out nothing more than ``origin``. +Trying ``git remote origin`` said it was an error and printed out the +command usage. That is when I noticed ``git remote show``. Running +``git remote show origin`` gave me the information I wanted. + +For hg, I never found the information I wanted on my own. It turns out +I wanted ``hg paths``, but that was not obvious from the description +of "show definition of symbolic path names" as printed by ``hg help``. + + +Updating a checkout +--------------------- + +To see how long it takes to update an outdated repository I timed both +updating a repository 700 commits behind and 50 commits behind (three +weeks stale and 1 week stale, respectively). + +==== =========== ========== +DVCS 700 commits 50 commits +---- ----------- ---------- +bzr 39 s 7 s +hg 17 s 3 s +git N/A 4 s +==== =========== ========== + +.. note:: + Git lacks a value for the *700 commits* scenario as it does + not seem to allow checking out a repository at a specific + revision. + +Git deserves special mention for its output from ``git pull``. It +not only lists the delta change information for each file but also +color-codes the information. + + +XXX ... usage on top of svn, filling in `Coordinated Development of a +New Feature`_ scenario -XXX ... to be continued Chosen DVCS =========== XXX +:: + + import random + print(random.choice(['svn', 'bzr', 'hg', 'git'])) Transition Plan

Somewhere near the top of the PEP, I think there should be a discussion of reasons not to change at all. Everytime we alter the tool chain, it disrupts developers lives. When I was working mainly under Windows, I was at one time the most active contributor by a huge margin. Then Martin switched the build to VC7 which caused me to go over year before I could make another checkin to CPython. By then VC8 was out and VC7 was hard to find. It crippled my development. When we switch tools, *every* developer will have to go through an install, switchover, and learn a new set of commands and checkin practices. The is a huge PITA and it is almost certain that some developers won't bother. For many of the developers that do bother, it will cost them a day of their lives getting switched over and working through the learning curve -- that is a day that could have been spent fixing bugs or doing something non-administrative that actually adds value. Before we boot SVN, I think we need to be damned sure that there will be HUGE offsetting benefits to a DVCS and be pretty sure that our little community is actually ready for a distributed process. I'm concerned that over time we're moving towards a process that has a lot more admin than we used to. Changes to PEP 8, rules on indentation, review rules, etc seem to be driven by the least active developers. Most of the folks who do most of the work rarely ask for more restrictions, more admin, etc. Even little things like automatically rejecting submissios without whitespace normalization add to the admin burden (time spent doing something that doesn't actually improve lives for end-users). SVN is relatively simple. DVCS systems are much more process oriented and aimed at professional developers. I think we will lose many newbie patches if the person is forced to use an unfamiliar version control system. I like the quality of the research that you're doing but suspect that the end goal of switching away from SVN may not be worth the disruption and effects on real developers. My two cents, Raymond

I'm concerned that over time we're moving towards a process that has a lot more admin than we used to. Changes to PEP 8, rules on indentation, review rules, etc seem to be driven by the least active developers. Most of the folks who do most of the work rarely ask for more restrictions, more admin, etc. Even little things like automatically rejecting submissios without whitespace normalization add to the admin burden (time spent doing something that doesn't actually improve lives for end-users).
I think this is unfair. I added the whitespace checking, and Benjamin Peterson revised it to be more verbose; Georg Brandl extended it to check the documentation as well. Which of us three do you consider the least active developer? Regards, Martin

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 24, 2009, at 6:40 PM, Raymond Hettinger wrote:
Somewhere near the top of the PEP, I think there should be a discussion of reasons not to change at all.
Everytime we alter the tool chain, it disrupts developers lives. When I was working mainly under Windows, I was at one time the most active contributor by a huge margin. Then Martin switched the build to VC7 which caused me to go over year before I could make another checkin to CPython. By then VC8 was out and VC7 was hard to find. It crippled my development. When we switch tools, *every* developer will have to go through an install, switchover, and learn a new set of commands and checkin practices. The is a huge PITA and it is almost certain that some developers won't bother. For many of the developers that do bother, it will cost them a day of their lives getting switched over and working through the learning curve -- that is a day that could have been spent fixing bugs or doing something non-administrative that actually adds value.
Before we boot SVN, I think we need to be damned sure that there will be HUGE offsetting benefits to a DVCS and be pretty sure that our little community is actually ready for a distributed process.
I'm concerned that over time we're moving towards a process that has a lot more admin than we used to. Changes to PEP 8, rules on indentation, review rules, etc seem to be driven by the least active developers. Most of the folks who do most of the work rarely ask for more restrictions, more admin, etc. Even little things like automatically rejecting submissios without whitespace normalization add to the admin burden (time spent doing something that doesn't actually improve lives for end-users).
SVN is relatively simple. DVCS systems are much more process oriented and aimed at professional developers. I think we will lose many newbie patches if the person is forced to use an unfamiliar version control system.
I like the quality of the research that you're doing but suspect that the end goal of switching away from SVN may not be worth the disruption and effects on real developers.
I think it's proper and worthwhile to consider development process changes that a DVCS would afford or require, and the impact on contributions both from current core developers and future ones. Let's be sure to examine all aspects though. I happen to think that a dvcs will be only slightly more painful for core developers but that pain will be more than offset by providing non-core developers near first-class status in the development ecosystem. Non-core developers do not have that now and there are many times the number of potential contributors than there are current contributors. So yes, let's definitely think about the collateral changes that moving to a DVCS will entail. But let's be sure to look at it from both sides. Besides, certain developments like support for the svn wire protocol in bzr would make the WFC (we fear change :) argument moot. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSXx/4XEjvBPtnXfVAQLDkwP/WI70x1UEjxMsL8YB+3H5ELRMhgaDVbdf IkodOEb0+NxXalyLfQavZylTV5NFgJ3v0kA76WhfqsNLEc7v7K2O6Ub1UDYL453M d9Z1+ok+t6U0XqfGbI6t8cMzMoqItIlyrDqpWApKrjnezkpT7x/PgSGzvZNGqrUl dGXagEqQ2ew= =SjJn -----END PGP SIGNATURE-----

On Sat, Jan 24, 2009 at 3:40 PM, Raymond Hettinger <python@rcn.com> wrote:
Somewhere near the top of the PEP, I think there should be a discussion of reasons not to change at all.
The possibility of rejection is an implicit part of every PEP. PEP 374 contains a long rationale section. If you disagree, fine, but I don't think you need to criticize the PEP's form.
Everytime we alter the tool chain, it disrupts developers lives. When I was working mainly under Windows, I was at one time the most active contributor by a huge margin. Then Martin switched the build to VC7 which caused me to go over year before I could make another checkin to CPython. By then VC8 was out and VC7 was hard to find. It crippled my development. When we switch tools, *every* developer will have to go through an install, switchover, and learn a new set of commands and checkin practices.
You forget that many developers are *already* using a DVCS, either for other projects, or using some kind of bridge. Otherwise the PEP wouldn't have gotten so much traction already!
The is a huge PITA and it is almost certain that some developers won't bother.
The same can be said for any change, including Python 3000 (or 2.6 for that matter). You seem to be seeing it as exclusively a PITA with no payback. That seems rather short-sighted. DVCS can have huge benefits for cooperative development. (Maybe the PEP could be extended with some examples? I believe Mozilla uses Mercurial for example. And you know the Linux kernel uses Git.)
For many of the developers that do bother, it will cost them a day of their lives getting switched over and working through the learning curve -- that is a day that could have been spent fixing bugs or doing something non-administrative that actually adds value.
Oh come on. I installed Mercurial (from source!) on two different machines and started being productive in half an hour (including the research to find where to download it, and how to configure the username). Learning all the ins and outs will take longer, for sure -- but you get the time back by having a much better way for managing tentative patches or multiple branches, for example.
Before we boot SVN, I think we need to be damned sure that there will be HUGE offsetting benefits to a DVCS and be pretty sure that our little community is actually ready for a distributed process.
"Our little community"? I see Python as one of the major open source projects, with an influence way beyond just the Python language users. There's probably only a dozen larger projects.
I'm concerned that over time we're moving towards a process that has a lot more admin than we used to. Changes to PEP 8, rules on indentation, review rules, etc seem to be driven by the least active developers. Most of the folks who do most of the work rarely ask for more restrictions, more admin, etc. Even little things like automatically rejecting submissions without whitespace normalization add to the admin burden (time spent doing something that doesn't actually improve lives for end-users).
(Some) more admin is the price we pay for our success. I recall the days when there was *no* admin needed and I could do everything myself. Frankly, I do *not* want to go back there.
SVN is relatively simple. DVCS systems are much more process oriented and aimed at professional developers.
I think you have this backwards. Most professionals (people coding for money in a company) are using more centralized VCSes (often because their management wants to know what the engineers are doing). To the contrary, it's the open source hobbyists who've taken up DVCSes, because they facilitate the *distributed* (get it? :-) way of working in such project.s
I think we will lose many newbie patches if the person is forced to use an unfamiliar version control system.
Newbies by definition don't use VCSes; they patch the source code of a two-year-old download in place and send us a new file, marked up with comments indicating where they made changes (and forgetting some other file they also patched earlier). Or if we're lucky they saved the original file and send us a diff -- in a format we cannot read. I was going to say that the one thing that's easier with svn is that it comes pre-installed on most Linuxes, while hg and bzr don't; but remember that 70% or moreof our users are on Windows -- the great equalizer, where they have to install whatever [D]VCS we tell them to use anyways.
I like the quality of the research that you're doing but suspect that the end goal of switching away from SVN may not be worth the disruption and effects on real developers.
My two cents,
Sorry, most of that sounds like FUD to me. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

You're added space/time measurements to the PEP. I think you ought to add another more subjective measurement for each system. How long did it take you to learn the new commands and ways of working with the system. How long did it take to get it configured and working on your system? What would be the impact on your life as a developer if you had to switchover? Talking about the Python language itself, we often make the point that developer time is far more important than saving a few clock cycles on a computer. The real impact of a switchover is the learning curve and configuration time multiplied by the number of developers. Also, how many developers or casual contributors will we lose because they simply aren't willing to bear the transition costs? Raymond

[in the future, Raymond, can you send these to python-dev? I don't think python-checkins is the right place to have over-arching discussions about stuff] On Sat, Jan 24, 2009 at 15:47, Raymond Hettinger <python@rcn.com> wrote:
You're added space/time measurements to the PEP. I think you ought to add another more subjective measurement for each system. How long did it take you to learn the new commands and ways of working with the system. How long did it take to get it configured and working on your system? What would be the impact on your life as a developer if you had to switchover?
Talking about the Python language itself, we often make the point that developer time is far more important than saving a few clock cycles on a computer. The real impact of a switchover is the learning curve and configuration time multiplied by the number of developers.
My impressions are not done yet. If you look at what I checked in earlier today you will notice I discuss how long it took to figure stuff out, etc. I am keeping it simple for now but I will discuss subjective things in more detail at a larger context when I come closer to reaching a conclusion.
Also, how many developers or casual contributors will we lose because they simply aren't willing to bear the transition costs?
That's an unmeasurable number short of seeing how large of an outcry there is against something. And since I have not even made a case for one DVCS over another it isn't worth worrying about this number quite yet. -Brett

[Sorry for the late reply here. I'm offline, on vacation, and not paying a lot of attention to email.] >> Also, how many developers or casual contributors will we lose because >> they simply aren't willing to bear the transition costs? Brett> That's an unmeasurable number short of seeing how large of an Brett> outcry there is against something. And since I have not even made Brett> a case for one DVCS over another it isn't worth worrying about Brett> this number quite yet. Still, it would seem to be a useful factor for you to estimate when considering which, if any, of the candidate DVCSs will provide the smallest barrier to entry. Skip

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 24, 2009, at 6:47 PM, Raymond Hettinger wrote:
Also, how many developers or casual contributors will we lose because they simply aren't willing to bear the transition costs?
And how many will we gain with tools that allow them to participate more fully? Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSXyAKnEjvBPtnXfVAQLztAP/QAbhSG8Dn1OLlQZiaKJ8wg4dGCMKp4q+ xIgSrzLg72OfVAa/mxNdu6lYVIpiMiZNI8g/J2ZtRjr/ww+WPTA6wPhvcR8b+slh p+eBbuw4Qz+rjXk5bQxV6+HpvmqhmqBAE2XucuXxlaDDONSQV6busGu1mlBxVycM kjxOzSO634I= =pWrI -----END PGP SIGNATURE-----

Brett, I'm offline, just reading through now stale checkin messages, so this may be more evident when viewed in a larger context, but it's not real clear that there's not a typo in the checkout times: >> ======= ================ ============== >> DVCS Time Space >> ------- ---------------- -------------- >> -svn 1:04 139 M >> -bzr 2:29:24 or 8:46 275 M or 596 M >> -hg 2:30 171 M >> -git 2:54 134 M >> +svn 1:04 139 M >> +bzr 1 2:29:24 275 M >> +bzr 2 8:46 596 M >> +hg 2:30 171 M >> +git 2:54 134 M >> ======= ================ ============== If bzr 1 really takes nearly 2.5 hours to check out the code perhaps it's worth noting that fact either explicitly or by using a time format which includes units, e.g., 2h29m24s vs. 2:29:24. Skip

On Sat, Feb 14, 2009 at 10:17, <skip@pobox.com> wrote:
Brett,
I'm offline, just reading through now stale checkin messages, so this may be more evident when viewed in a larger context, but it's not real clear that there's not a typo in the checkout times:
This is so stale the worry is not longer an issue. =) -Brett
participants (8)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
brett.cannon
-
Guido van Rossum
-
Raymond Hettinger
-
skip@pobox.com