Software Carpentry FAQ
- What is the Software Carpentry project?
The aim of the Software Carpentry project is to make it easier for
programmers in general, and scientific programmers in particular, to
adopt better software development practices. The project will achieve
this by creating tools that are easier to learn and use, and by
documenting those tools and the practices they embody.
- Where does the name come from?
The name is a play on "software engineering", and is meant to indicate
that this project is initially concerned with medium-sized teams (up
to a dozen or two programmers) and medium-term timescales (a year or
- How did the project get started?
The project has its origins in a series of
articles that Greg Wilson organized for the Fall 1996 and Winter
1996 issues of IEEE Computational Science and
Engineering. These articles outlined what their authors thought
computer scientists should teach to physical scientists and
engineers. Most authors recommended numerical methods or the standard
Unix toolset, but Steve McConnell argued that better programming
practices would have the greatest impact on productivity.
As a result of that observation, Greg Wilson, Brent Gorda, and
Steve McConnell put together a 3-day course on software engineering
for scientists and engineers, which they taught several times at the
Los Alamos National Laboratory. Feedback on the course was very
positive, but many participants felt that the tools being
taught---Perl, Make, CVS, and so on---were unnecessarily difficult to
install, learn, and use. They were also frustrated by the scarcity of
examples of design documents, testing plans, and all of the other
things the course was trying to teach them.
- Why Open Source?
There are three reasons why the Software Carpentry project is
following the Open Source model:
- Leveraging existing knowledge.
A closed project can only take advantage of a few minds. As
Linux and other projects have shown, a well-run Open Source
project can harness the experience and insight of thousands of
- Lowering barriers to adoption.
Freely-available tools are more likely to be picked up than
their commercial equivalents. This is particularly true when
the tool in question does something novel (at least from the
point of the person adopting it), and in academia (where
budgets are limited).
- Encouraging peer review.
Dan Gezelter�s talk
at the first Open Source/Open Science conference discussed how
the scientific tradition of peer review fits with the
philosophy of the Open Source movement. By designing and
building these tools in the open, the Software Carpentry
project will both encourage peer review of the tools
themselves, and demonstrate how this ought to be done for
scientific and commercial software.
- Where does the funding come from?
The funding comes from the U.S. Department of Energy, through the
Advanced Computing Laboratory at Los Alamos National Laboratory. The
project is being administered by Code Sourcery. US$480,000 has been
provided for 2000, and US$380,000 for 2001.
- Why would the Department of Energy fund something like this?
The funding has been provided partly because the DoE would like
scientists and engineers to be more productive, and partly because it
would like to find out whether the Open Source model and community can
meet the special needs of high-performance computational science. The
last few years have seen most manufacturers of special-purpose
supercomputers disappear or be bought out, and the rise of clusters
based on commercial off-the-shelf (COTS) hardware, Linux, MPI, the GNU
compiler toolset, and so on. There is a growing feeling that these
machines could bring scalable supercomputing into the mainstream, but
this will only happen if good tools and practices are accessible
- I'm not a scientist or engineer---what's in it for me?
The things that make many existing Open Source software development
tools difficult to learn and use---obscure syntax, arbitrary or
hard-to-follow behavior, and poor documentation---affect professional
programmers and computer science students just as much as they do
computational scientists and engineers. If the Open Source movement
can build tools that are simple enough to be learned by people who
have problems of their own to solve, and yet powerful enough to
support distributed development of hundreds of thousands of lines of
complex numerical and visualization code, then those tools will
probably also help people who want to build Internet chat rooms and
This project should also be interesting to the general programming
community because it is going to place more emphasis on design and
early feedback than most Open Source projects have to date. Instead of
growing someone�s pet project, Software Carpentry is going to
organize---and pay for---a design competition. If this works, it could
be an interesting model for other Open Source projects to adopt.
- I think [tool] is good enough already---why are you re-inventing the wheel?
The short answer to this is Alan Cooper's:
The phrase "computer literate user" really means the person
has been hurt so many times that the scar tissue is thick
enough so he no longer feels the pain.
The longer answer is that the "accidental complexity" of the standard
Unix command-line toolset is a major barrier to its adoption by people
who are not full-time programmers, or for whom programming is just
something that has to be done in order to do something else. Many
professional programmers---particularly those who enjoy programming
enough to be involved in the Open Source movement---have been using
these tools for so long that they simply don't remember how hard it is
to configure Gnats, or pass variable bindings between recursive calls
-- Alan Cooper,
The Inmates are Running the Asylum
And let's face it: if Make or Autoconf were built from scratch today,
they would be written as extensible, embeddable modules in a
high-level scripting language. This would not only make them easier to
use, it would also make them easier to learn, since they would employ
one syntax for all purposes. Microsoft Visual Basic has shown just how
useful it can be to have a single general-purpose "glue" language
capable of binding disparate tools together; the aim of the first half
of this project is to bring those benefits to the Open Source
- What projects are currently under way?
Software Carpentry will start by producing:
- a platform inspection tool similar to Autoconf;
- a build management tool similar to Make;
- an issue tracking system similar to Gnats or Bugzilla; and
- a unit and regression testing harness with the
functionality of XUnit, Expect, and DejaGnu.
- Why were those tools chosen?
These four tools were chosen as initial targets for several
reasons. First, the working practices they support are essential to
medium-scale software engineering. Second, the tools they are intended
to replace are generally recognized as being outdated or flawed. This
creates demand, and increases the odds that rational reimplementations
will be adopted. Third, enough people have enough experience with the
tools that are to be replaced to participate in the design competition
- Why isn�t [tool] on this list?
There are several other tools that could have been on this list, and
will be added if the first round of work goes well. A cross-platform
version control system that corrects the many deficiencies in CVS, for
example, is an obvious candidate, but is probably too large to be
tackled initially, and any work done by Software Carpentry could well
be superseded by BitKeeper. Similarly, the world needs a good Open
Source project management tool with the functionality of Microsoft
Project, but probably needs the four tools listed above more urgently.
- What languages and tools will be used?
All development work will be done in Python.
- Why Python?
This is actually three questions:
- Why mandate a language?
Building everything in a single language will encourage
projects to share code, which will both keep the total volume
of code manageable and raise the quality of the
implementations (since the shared code will be exercised, and
tested, in many different ways). Using a single language will
also improve the comprehensibility, and hence the
maintainability and extensibility, of the tools. The varying
syntax of Make, Autoconf, and other tools is a large practical
barrier to their adoption by people who have better (or at
least more pressing) things to do than learn yet another
syntax. Microsoft�s Visual Basic has shown how powerful it
is to use a single, flexible language everywhere.
- Why use a scripting language?
A lot of anecdotal evidence shows that "relaxed" high-level
languages (like Python, Perl, and Visual Basic) are more
productive vehicles for process management, text processing,
and similar tasks than their "strict" equivalents (like C++
- Why use Python?
The four candidates considered were Visual Basic, Perl, Tcl,
- Visual Basic
Visual Basic is proprietary, and there is no
indication that a credible Open Source implementation
will appear any time soon.
Perl was a strong contender, primarily because of the
many libraries that have been developed for it, and
because of the number of books that document
it. However, our experience teaching at Los Alamos was
that Perl�s syntax is hard to learn, its behavior
often arbitrary, and its size intimidating. While
full-time professional programmers with several other
languages under their belts might (and often do) say
that it all makes sense once you know it, we want to
make the learning curve as gentle as possible.
Tcl is easier to learn and read than Perl, but is not
as well documented, and doesn�t come with as many
libraries. Had Python not existed, Tcl would probably
have been chosen for this project.
Python provides the same functionality as Perl or Tcl,
but has proved to be easier to learn, read, and
remember. (For example, words like "except" and
"unless" appear much less often in Python reference
material than they do in Perl reference material.)
Python is not yet as extensively documented as Perl,
but the number of books is growing, as is the number
of modules and libraries. Finally, the Python
community is still small enough for a project like
this one to attract the attention of a significant
proportion of it.
- How will development be organized and coordinated?
Everything the project produces---designs, critiques of those designs,
test suites, and examples, as well as actual source code---will be
available through the project�s Web site at
software-carpentry.codesourcery.com. Each project will have a
coordinator, whose job it will be to moderate discussion, synchronize
releases, track work items, and report on progress. The coordinator
will also be responsible for collating and editing feedback from
judges during the design competition.
- Why a design competition?
Most Open Source packages have their roots in someone�s pet hobby
project, which others have picked up, extended, and modified. This
kind of organic growth has a lot of good features, but a
well-documented design is not one of them. As a result, programmers
often have to rely on folklore and reverse engineering if they want to
add to, or fix, these tools. In addition, there is a dearth of
examples of good design for new programmers to learn from.
Software Carpentry project hopes to address both problems by running a
two-stage design competition. The best entries in both rounds will be
published, along with commentary from the competition�s
judges. This material will serve both to inform and guide further
development, and to show novices what experienced programmers think
about before they start coding.
- Who can enter?
Everyone: individuals and teams, students and professionals, from
anywhere in the world.
- What are the rules?
The full rules are available at:
Basically, initial submissions must be written in English, and can be
up to 10 pages long. Examples count against this limit, but diagrams
and a Unix-style man page do not. Any person or team may submit only
one entry in any given category, but can submit in as many of the four
categories as desired.
The best four entries in each category will be awarded US$2500, and
asked to submit full designs. Participants will be strongly encouraged
to pool their efforts for the second round. The best second-round
submission will be awarded an additional US$7500, while the others
will receive another US$2500 each. The real reward will be seeing the
design implemented, and being in a good position to bid on the
- What should first-round submissions contain?
An example of what a submission should contain, and how it should be
formatted is available at:
First-round entries should focus primarily on what the tool will do,
and how it will be used: command-line options, input and output file
formats, sketches of Web and GUI interfaces (where appropriate), and
so on. Second-round submissions will then be expected to describe how
it�s all going to be implemented.
- Who will the judges be?
Need to firm up the list of judges ASAP.
- When are the deadlines?
The deadline for first-round submissions is March 31, 2000. The five
best proposals in each category will be announced on April 30,
2000. Full submissions are due on June 1, 2000, and winners will be
announced on June 30, 2000.
- Won't prizes discourage co-operation?
We don�t know. On the one hand, people might want to hoard their
best ideas; on the other hand, the best designs in both rounds are
going to be published, along with the judges� commentary, and we
will be encouraging participants to pool their efforts. Most of the
money that will be paid out will go to fund implementation, testing,
and documentation; we hope that people will collaborate in the early
stages, and treat the prizes as recognition for their effort, rather
than treating US$10,000 as their retirement fund.
- What documentation will be produced?
The Software Carpentry project will produce several different kinds of
- Design documentation.
As stated above, the best designs in each category will be
published, along with the judges� commentary. This material
ought to play the role that music criticism has played in the
development of music, by giving newcomers (and experienced
programmers) better insight into how good designers think.
- User guides.
The project will pay for the development of man pages, user
guides, online help, and all the other documentation needed to
turn a program into a product.
- Test suites.
The project will also pay for the development of
industrial-strength test suites for all four tools. These
suites will be published, both to serve as a starting point
for other projects and to demonstrate good practice.
- Case studies.
It is often easier to show someone how to do something than to
explain it to them. The Software Carpentry project will pay
for case studies that describe how these tools, and (more
importantly) the working practices they support, have been
deployed in practice. Checklists, templates for forms, and
other errata can be submitted.
- What format(s) will be used?
The primary format for all documentation will be HTML. The project
will migrate to XML when and as feasible.
- What restrictions are there on using the documentation?
Only those that also apply to the software, under the terms of its
Open Source license. You can copy and distribute the documentation in
any form, but only if its author(s) and origin are clearly shown, and
if you include a description of how readers can access the
originals. In particular, the documentation can be reproduced in
books, but only if the authors, origin, and location of the originals
is printed clearly on each page.