[Python-checkins] r68914 - peps/trunk/pep-0374.txt
brett.cannon
python-checkins at python.org
Sun Jan 25 00:06:33 CET 2009
Author: brett.cannon
Date: Sun Jan 25 00:06:31 2009
New Revision: 68914
Log:
Thoroughly clean up what impressions I have written so far.
Modified:
peps/trunk/pep-0374.txt
Modified: peps/trunk/pep-0374.txt
==============================================================================
--- peps/trunk/pep-0374.txt (original)
+++ peps/trunk/pep-0374.txt Sun Jan 25 00:06:31 2009
@@ -1344,120 +1344,136 @@
is likely to shock a veteran CVS user.
-Impressions
-===========
+Tests/Impressions
+=================
+
As I (Brett Cannon) am left with the task of of making the final
decision of which/any DVCS to go with and not my co-authors, I felt
-it only fair to write down my impressions as I evaluate the various
-tools so as to be as transparent as possible.
+it only fair to write down what tests I ran and my impressions as I
+evaluate the various tools so as to be as transparent as possible.
+
+
+Barrier to Entry
+----------------
-To begin, I measured the checking out of code as if I was a non-core
-developer. This is important as this is the first impression
-developers have when they decide they wish to contribute a patch to
-Python. Timings were done using the ``time`` command in zsh and
+The amount of time and effort it takes to get a checkout of Python's
+repository is critical. If the difficulty or time is too great then a
+person wishing to contribute to Python may very well give up. That
+cannot be allowed to happen.
+
+I measured the checking out of code as if I was a non-core
+developer. Timings were done using the ``time`` command in zsh and
space was calculated with ``du -c -h``.
======= ================ ==============
DVCS Time Space
------- ---------------- --------------
-svn 1:04 139 M
-bzr 2:29:24 or 8:46 275 M or 596 M
-hg 2:30 171 M
-git 2:54 134 M
+svn 1:04 139 M
+bzr 1 2:29:24 275 M
+bzr 2 8:46 596 M
+hg 2:30 171 M
+git 2:54 134 M
======= ================ ==============
-The svn measurements are not exactly a 1:1 comparison to the DVCSs.
-For one, svn does not download the entire revision history, and thus
-(should) have the least amount to download. And two, because various
-calculation steps are left up to the server the entire process of
-checking out code (should) be faster.
-
-But the svn measurements should be considered as what developers are
-used to. Thus they act as a reference point for what people tend to
-expect in terms of performance.
-
-Looking at bzr, I have listed two numbers. The first values are for
-running ``bzr branch`` as outlined in the `One-Off Checkout`_
-scenario. When the
-timings came back in hours (I used Launchpad as code.python.org is
-not running the newest version of bzr and I wanted to use its latest
-networking protocol), I decided to try using the steps outlined when
-the experimental bzr branches were first created. That second
-approach is what the second set of values for bzr represent.
-
-While both the hg and git numbers are perfectly acceptable, the bzr
-numbers not necessarily. The raw ``bzr branch`` approach is entirely
-not acceptable as no one wants to wait over two hours to write a
-potentially one line change to some code for the benefit of Python.
-Assuming 8:46 is a reasonable amount of time (I believe it in
-general is, but it is teetering on not), the 596 M space requirement
-could be an issue for some. While we typically view disk space as
-cheap, for some people it might be an issue (e.g. the person who did
-the schedule for PyCon 2008 did it over a connection so badly that
-Google Spreadsheets didn't work for him and he had to submit the
-schedule in another form than the one original used). Once again I
-think the space usage is acceptable, but it is close to being too
-much.
-
-To see if bzr's performance would be acceptable once at least the
-branch was downloaded, I decided to see how long it would take to
-get the change log for a file. I chose the README file as it sees
-regular changes for every release and has a revision history going
-back to 1993 and thus would have a fair number of revisions.
-It should be mentioned that while git had the nicest output thanks to
-its color terminal output, it also took a while to find the
-``--no-pager`` flag in order to get just a stream of text instead of
-having the output sent to the pager.
-
-Overall the numbers were all acceptable:
-
-* bzr: 4.5 seconds
-* hg: 1.1 seconds
-* git: 1.5 seconds
-
-While having bzr be over 3x slower than its nearest neighbor, it
-must be kept in mind that the total performance time is still
-acceptable, regardless of the multiplier.
-
-Because a DVCS keeps its revision history on disk, it also means
-that typically they can be zipped up for direct downloading. At
-least in bzr's case that would solve the performance issue for
-initial checkout if the zip file could be generate constantly. But
-that didn't address the cost of pulling in new revisions when a
-checkout has gone stale. To measure this I decided I would check out
-the repositories back about 700 revisions which represented the
-amount of change made since the beginning of the month and time how
-long they took to update.
-
-For this to happen I first had to remember the URLs for the
-repositories. Instead of simply looking in this PEP, though, I
-decided to try to figure it out from the command-line help for each
-tool or simply guessing. Bzr worked out great with ``bzr info``. Git
-took a little poking around, but I figured out ``git remote show
-origin`` told me what I needed. For hg, though, I couldn't figure it
-out short of running ``hg pull`` and denoting the status information
-during the pull (turns out ``hg paths`` is what I was looking for).
-
-With the repository locations known I then had to perform a checkout
-to a certain revision. Turns out that git will not clone a
-repository to only a specific revision, although from personal
-experience git's pull facility is very fast. Bzr was able to perform
-its update in just over 39 seconds. Hg did its update in just over
-17 seconds. Much like the log test, while the multiplier of slowness
-seems high, in real life terms al DVCSs performed within reason.
-
-In my mind this means that bzr is only an acceptable candidate as
-long as an fairly up-to-date archive of Python's key branches are
-made available for people to download to avoid bzr's very so remote
-branching.
+.. note::
+ The *bzr 1* entry is for
+ following the instructions in the `One-Off Checkout`_ scenario
+ instructions pulling from Launchpad_ in mid-January.
+ The *bzr 2* entry is based on following the instructions
+ for the `experimental Bazaar branches
+ <http://www.python.org/dev/bazaar/>`_ and pulling from
+ http://code.python.org/python/trunk/.
+
+When comparing these numbers to svn, it is important to realize that
+it is not a 1:1 comparison. Svn does not pull down the entire revision
+history like all of the DVCSs do. That means svn can perform an
+initial checkout much faster than the DVCS purely based on the fact
+that it has less information to worry about.
+
+
+Performance of basic information functionality
+----------------------------------------------
+
+To see how the tools did for performing a command that required
+querying the history, the log for the ``README`` file was timed.
+
+==== =====
+DVCS Time
+---- -----
+bzr 4.5 s
+hg 1.1 s
+git 1.5 s
+==== =====
+
+One thing of note during this test was that git took longer than the
+other three tools to figure out how to get the log without it using a
+pager. While the pager use is a nice touch in general, not having it
+automatically turn on took some time (turns out the main ``git``
+command has a ``--no-pager`` flag to disable use of the pager).
+
+
+Figuring out what command to use from built-in help
+----------------------------------------------------
+
+I ended up trying to find out what the command was to see what URL the
+repository was cloned from. To do this I used nothing more than the
+help provided by the tool itself or its man pages.
+
+Bzr was the easiest: ``bzr info``. Running ``bzr help`` didn't show
+what I wanted, but mentioned ``bzr help commands``. That list had the
+command with a description that made sense.
+
+Git was the second easiest. The command ``git help`` didn't show much
+and did not have a way of listing all commands. That is when I viewed
+the man page. Reading through the various commands I discovered ``git
+remote``. The command itself spit out nothing more than ``origin``.
+Trying ``git remote origin`` said it was an error and printed out the
+command usage. That is when I noticed ``git remote show``. Running
+``git remote show origin`` gave me the information I wanted.
+
+For hg, I never found the information I wanted on my own. It turns out
+I wanted ``hg paths``, but that was not obvious from the description
+of "show definition of symbolic path names" as printed by ``hg help``.
+
+
+Updating a checkout
+---------------------
+
+To see how long it takes to update an outdated repository I timed both
+updating a repository 700 commits behind and 50 commits behind (three
+weeks stale and 1 week stale, respectively).
+
+==== =========== ==========
+DVCS 700 commits 50 commits
+---- ----------- ----------
+bzr 39 s 7 s
+hg 17 s 3 s
+git N/A 4 s
+==== =========== ==========
+
+.. note::
+ Git lacks a value for the *700 commits* scenario as it does
+ not seem to allow checking out a repository at a specific
+ revision.
+
+Git deserves special mention for its output from ``git pull``. It
+not only lists the delta change information for each file but also
+color-codes the information.
+
+
+XXX ... usage on top of svn, filling in `Coordinated Development of a
+New Feature`_ scenario
-XXX ... to be continued
Chosen DVCS
===========
XXX
+::
+
+ import random
+ print(random.choice(['svn', 'bzr', 'hg', 'git']))
Transition Plan
More information about the Python-checkins
mailing list