[Python-checkins] r68914 - peps/trunk/pep-0374.txt

brett.cannon python-checkins at python.org
Sun Jan 25 00:06:33 CET 2009


Author: brett.cannon
Date: Sun Jan 25 00:06:31 2009
New Revision: 68914

Log:
Thoroughly clean up what impressions I have written so far.


Modified:
   peps/trunk/pep-0374.txt

Modified: peps/trunk/pep-0374.txt
==============================================================================
--- peps/trunk/pep-0374.txt	(original)
+++ peps/trunk/pep-0374.txt	Sun Jan 25 00:06:31 2009
@@ -1344,120 +1344,136 @@
 is likely to shock a veteran CVS user.
 
 
-Impressions
-===========
+Tests/Impressions
+=================
+
 As I (Brett Cannon) am left with the task of of making the final
 decision of which/any DVCS to go with and not my co-authors, I felt
-it only fair to write down my impressions as I evaluate the various
-tools so as to be as transparent as possible.
+it only fair to write down what tests I ran and my impressions as I
+evaluate the various tools so as to be as transparent as possible.
+
+
+Barrier to Entry
+----------------
 
-To begin, I measured the checking out of code as if I was a non-core
-developer. This is important as this is the first impression
-developers have when they decide they wish to contribute a patch to
-Python. Timings were done using the ``time`` command in zsh and
+The amount of time and effort it takes to get a checkout of Python's
+repository is critical. If the difficulty or time is too great then a
+person wishing to contribute to Python may very well give up. That
+cannot be allowed to happen. 
+
+I measured the checking out of code as if I was a non-core
+developer. Timings were done using the ``time`` command in zsh and
 space was calculated with ``du -c -h``.
 
 ======= ================ ==============
 DVCS	Time	         Space	
 ------- ---------------- --------------
-svn	1:04             139 M
-bzr	2:29:24 or 8:46	 275 M or 596 M	
-hg	2:30	         171 M	
-git	2:54	         134 M	
+svn	   1:04          139 M
+bzr 1	2:29:24  	 275 M
+bzr 2      8:46          596 M
+hg	   2:30	         171 M	
+git	   2:54	         134 M	
 ======= ================ ==============
 
-The svn measurements are not exactly a 1:1 comparison to the DVCSs.
-For one, svn does not download the entire revision history, and thus
-(should) have the least amount to download. And two, because various
-calculation steps are left up to the server the entire process of
-checking out code (should) be faster.
-
-But the svn measurements should be considered as what developers are
-used to. Thus they act as a reference point for what people tend to
-expect in terms of performance.
-
-Looking at bzr, I have listed two numbers. The first values are for
-running ``bzr branch`` as outlined in the `One-Off Checkout`_
-scenario. When the
-timings came back in hours (I used Launchpad as code.python.org is
-not running the newest version of bzr and I wanted to use its latest
-networking protocol), I decided to try using the steps outlined when
-the experimental bzr branches were first created. That second
-approach is what the second set of values for bzr represent.
-
-While both the hg and git numbers are perfectly acceptable, the bzr
-numbers not necessarily. The raw ``bzr branch`` approach is entirely
-not acceptable as no one wants to wait over two hours to write a
-potentially one line change to some code for the benefit of Python.
-Assuming 8:46 is a reasonable amount of time (I believe it in
-general is, but it is teetering on not), the 596 M space requirement
-could be an issue for some. While we typically view disk space as
-cheap, for some people it might be an issue (e.g. the person who did
-the schedule for PyCon 2008 did it over a connection so badly that
-Google Spreadsheets didn't work for him and he had to submit the
-schedule in another form than the one original used). Once again I
-think the space usage is acceptable, but it is close to being too
-much.
-
-To see if bzr's performance would be acceptable once at least the
-branch was downloaded, I decided to see how long it would take to
-get the change log for a file. I chose the README file as it sees
-regular changes for every release and has a revision history going
-back to 1993 and thus would have a fair number of revisions.
-It should be mentioned that while git had the nicest output thanks to
-its color terminal output, it also took a while to find the
-``--no-pager`` flag in order to get just a stream of text instead of
-having the output sent to the pager.
-
-Overall the numbers were all acceptable:
-
-* bzr: 4.5 seconds
-* hg: 1.1 seconds
-* git: 1.5 seconds
-
-While having bzr be over 3x slower than its nearest neighbor, it
-must be kept in mind that the total performance time is still
-acceptable, regardless of the multiplier.
-
-Because a DVCS keeps its revision history on disk, it also means
-that typically they can be zipped up for direct downloading. At
-least in bzr's case that would solve the performance issue for
-initial checkout if the zip file could be generate constantly. But
-that didn't address the cost of pulling in new revisions when a
-checkout has gone stale. To measure this I decided I would check out
-the repositories back about 700 revisions which represented the
-amount of change made since the beginning of the month and time how
-long they took to update.
-
-For this to happen I first had to remember the URLs for the
-repositories. Instead of simply looking in this PEP, though, I
-decided to try to figure it out from the command-line help for each
-tool or simply guessing. Bzr worked out great with ``bzr info``. Git
-took a little poking around, but I figured out ``git remote show
-origin`` told me what I needed. For hg, though, I couldn't figure it
-out short of running ``hg pull`` and denoting the status information
-during the pull (turns out ``hg paths`` is what I was looking for).
-
-With the repository locations known I then had to perform a checkout
-to a certain revision. Turns out that git will not clone a
-repository to only a specific revision, although from personal
-experience git's pull facility is very fast. Bzr was able to perform
-its update in just over 39 seconds. Hg did its update in just over
-17 seconds. Much like the log test, while the multiplier of slowness
-seems high, in real life terms al DVCSs performed within reason.
-
-In my mind this means that bzr is only an acceptable candidate as
-long as an fairly up-to-date archive of Python's key branches are
-made available for people to download to avoid bzr's very so remote
-branching.
+.. note::
+    The *bzr 1* entry is for
+    following the instructions in the `One-Off Checkout`_ scenario
+    instructions pulling from Launchpad_ in mid-January.
+    The *bzr 2* entry is based on following the instructions
+    for the `experimental Bazaar branches
+    <http://www.python.org/dev/bazaar/>`_ and pulling from
+    http://code.python.org/python/trunk/.
+
+When comparing these numbers to svn, it is important to realize that
+it is not a 1:1 comparison. Svn does not pull down the entire revision
+history like all of the DVCSs do. That means svn can perform an
+initial checkout much faster than the DVCS purely based on the fact
+that it has less information to worry about.
+
+
+Performance of basic information functionality
+----------------------------------------------
+
+To see how the tools did for performing a command that required
+querying the history, the log for the ``README`` file was timed.
+
+====  =====
+DVCS  Time
+----  -----
+bzr   4.5 s
+hg    1.1 s
+git   1.5 s
+====  =====
+
+One thing of note during this test was that git took longer than the
+other three tools to figure out how to get the log without it using a
+pager. While the pager use is a nice touch in general, not having it
+automatically turn on took some time (turns out the main ``git``
+command has a ``--no-pager`` flag to disable use of the pager).
+
+
+Figuring out what command to use from built-in help
+----------------------------------------------------
+
+I ended up trying to find out what the command was to see what URL the
+repository was cloned from. To do this I used nothing more than the
+help provided by the tool itself or its man pages.
+
+Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show
+what I wanted, but mentioned ``bzr help commands``. That list had the
+command with a description that made sense.
+
+Git was the second easiest. The command ``git help`` didn't show much
+and did not have a way of listing all commands. That is when I viewed
+the man page. Reading through the various commands I discovered ``git
+remote``. The command itself spit out nothing more than ``origin``.
+Trying ``git remote origin`` said it was an error and printed out the
+command usage. That is when I noticed ``git remote show``. Running
+``git remote show origin`` gave me the information I wanted.
+
+For hg, I never found the information I wanted on my own. It turns out
+I wanted ``hg paths``, but that was not obvious from the description
+of "show definition of symbolic path names" as printed by ``hg help``.
+
+
+Updating a checkout
+---------------------
+
+To see how long it takes to update an outdated repository I timed both
+updating a repository 700 commits behind and 50 commits behind (three
+weeks stale and 1 week stale, respectively).
+
+====  ===========  ==========
+DVCS  700 commits  50 commits
+----  -----------  ----------
+bzr   39 s         7 s
+hg    17 s         3 s
+git   N/A          4 s
+====  ===========  ==========
+
+.. note::
+    Git lacks a value for the *700 commits* scenario as it does
+    not seem to allow checking out a repository at a specific
+    revision.
+
+Git deserves special mention for its output from ``git pull``. It
+not only lists the delta change information for each file but also
+color-codes the information.
+
+
+XXX ... usage on top of svn, filling in `Coordinated Development of a
+New Feature`_ scenario
 
-XXX ... to be continued
 
 
 Chosen DVCS
 ===========
 
 XXX
+::
+
+    import random
+    print(random.choice(['svn', 'bzr', 'hg', 'git']))
  
 
 Transition Plan


More information about the Python-checkins mailing list