Software Carpentry FAQ

General information

  1. What is the Software Carpentry project?
    The aim of the Software Carpentry project is to make it easier for programmers in general, and scientific programmers in particular, to adopt better software development practices. The project will achieve this by creating tools that are easier to learn and use, and by documenting those tools and the practices they embody.
  2. Where does the name come from?
    The name is a play on "software engineering", and is meant to indicate that this project is initially concerned with medium-sized teams (up to a dozen or two programmers) and medium-term timescales (a year or two).
  3. How did the project get started?
    The project has its origins in a series of articles that Greg Wilson organized for the Fall 1996 and Winter 1996 issues of IEEE Computational Science and Engineering. These articles outlined what their authors thought computer scientists should teach to physical scientists and engineers. Most authors recommended numerical methods or the standard Unix toolset, but Steve McConnell argued that better programming practices would have the greatest impact on productivity.
    As a result of that observation, Greg Wilson, Brent Gorda, and Steve McConnell put together a 3-day course on software engineering for scientists and engineers, which they taught several times at the Los Alamos National Laboratory. Feedback on the course was very positive, but many participants felt that the tools being taught---Perl, Make, CVS, and so on---were unnecessarily difficult to install, learn, and use. They were also frustrated by the scarcity of examples of design documents, testing plans, and all of the other things the course was trying to teach them.
  4. Why Open Source?
    There are three reasons why the Software Carpentry project is following the Open Source model:
    1. Leveraging existing knowledge.
      A closed project can only take advantage of a few minds. As Linux and other projects have shown, a well-run Open Source project can harness the experience and insight of thousands of people.
    2. Lowering barriers to adoption.
      Freely-available tools are more likely to be picked up than their commercial equivalents. This is particularly true when the tool in question does something novel (at least from the point of the person adopting it), and in academia (where budgets are limited).
    3. Encouraging peer review.
      Dan Gezelter�s talk at the first Open Source/Open Science conference discussed how the scientific tradition of peer review fits with the philosophy of the Open Source movement. By designing and building these tools in the open, the Software Carpentry project will both encourage peer review of the tools themselves, and demonstrate how this ought to be done for scientific and commercial software.
  5. Where does the funding come from?
    The funding comes from the U.S. Department of Energy, through the Advanced Computing Laboratory at Los Alamos National Laboratory. The project is being administered by Code Sourcery. US$480,000 has been provided for 2000, and US$380,000 for 2001.
  6. Why would the Department of Energy fund something like this?
    The funding has been provided partly because the DoE would like scientists and engineers to be more productive, and partly because it would like to find out whether the Open Source model and community can meet the special needs of high-performance computational science. The last few years have seen most manufacturers of special-purpose supercomputers disappear or be bought out, and the rise of clusters based on commercial off-the-shelf (COTS) hardware, Linux, MPI, the GNU compiler toolset, and so on. There is a growing feeling that these machines could bring scalable supercomputing into the mainstream, but this will only happen if good tools and practices are accessible enough.
  7. I'm not a scientist or engineer---what's in it for me?
    The things that make many existing Open Source software development tools difficult to learn and use---obscure syntax, arbitrary or hard-to-follow behavior, and poor documentation---affect professional programmers and computer science students just as much as they do computational scientists and engineers. If the Open Source movement can build tools that are simple enough to be learned by people who have problems of their own to solve, and yet powerful enough to support distributed development of hundreds of thousands of lines of complex numerical and visualization code, then those tools will probably also help people who want to build Internet chat rooms and order-tracking systems.
    This project should also be interesting to the general programming community because it is going to place more emphasis on design and early feedback than most Open Source projects have to date. Instead of growing someone�s pet project, Software Carpentry is going to organize---and pay for---a design competition. If this works, it could be an interesting model for other Open Source projects to adopt.
  8. I think [tool] is good enough already---why are you re-inventing the wheel?
    The short answer to this is Alan Cooper's:
    The phrase "computer literate user" really means the person has been hurt so many times that the scar tissue is thick enough so he no longer feels the pain.
    -- Alan Cooper, The Inmates are Running the Asylum
    The longer answer is that the "accidental complexity" of the standard Unix command-line toolset is a major barrier to its adoption by people who are not full-time programmers, or for whom programming is just something that has to be done in order to do something else. Many professional programmers---particularly those who enjoy programming enough to be involved in the Open Source movement---have been using these tools for so long that they simply don't remember how hard it is to configure Gnats, or pass variable bindings between recursive calls to Make.
    And let's face it: if Make or Autoconf were built from scratch today, they would be written as extensible, embeddable modules in a high-level scripting language. This would not only make them easier to use, it would also make them easier to learn, since they would employ one syntax for all purposes. Microsoft Visual Basic has shown just how useful it can be to have a single general-purpose "glue" language capable of binding disparate tools together; the aim of the first half of this project is to bring those benefits to the Open Source community.

Development

  1. What projects are currently under way?
    Software Carpentry will start by producing:
    1. a platform inspection tool similar to Autoconf;
    2. a build management tool similar to Make;
    3. an issue tracking system similar to Gnats or Bugzilla; and
    4. a unit and regression testing harness with the functionality of XUnit, Expect, and DejaGnu.
  2. Why were those tools chosen?
    These four tools were chosen as initial targets for several reasons. First, the working practices they support are essential to medium-scale software engineering. Second, the tools they are intended to replace are generally recognized as being outdated or flawed. This creates demand, and increases the odds that rational reimplementations will be adopted. Third, enough people have enough experience with the tools that are to be replaced to participate in the design competition described later.
  3. Why isn�t [tool] on this list?
    There are several other tools that could have been on this list, and will be added if the first round of work goes well. A cross-platform version control system that corrects the many deficiencies in CVS, for example, is an obvious candidate, but is probably too large to be tackled initially, and any work done by Software Carpentry could well be superseded by BitKeeper. Similarly, the world needs a good Open Source project management tool with the functionality of Microsoft Project, but probably needs the four tools listed above more urgently.
  4. What languages and tools will be used?
    All development work will be done in Python.
  5. Why Python?
    This is actually three questions:
    1. Why mandate a language?
      Building everything in a single language will encourage projects to share code, which will both keep the total volume of code manageable and raise the quality of the implementations (since the shared code will be exercised, and tested, in many different ways). Using a single language will also improve the comprehensibility, and hence the maintainability and extensibility, of the tools. The varying syntax of Make, Autoconf, and other tools is a large practical barrier to their adoption by people who have better (or at least more pressing) things to do than learn yet another syntax. Microsoft�s Visual Basic has shown how powerful it is to use a single, flexible language everywhere.
    2. Why use a scripting language?
      A lot of anecdotal evidence shows that "relaxed" high-level languages (like Python, Perl, and Visual Basic) are more productive vehicles for process management, text processing, and similar tasks than their "strict" equivalents (like C++ and Java).
    3. Why use Python?
      The four candidates considered were Visual Basic, Perl, Tcl, and Python.
      1. Visual Basic
        Visual Basic is proprietary, and there is no indication that a credible Open Source implementation will appear any time soon.
      2. Perl
        Perl was a strong contender, primarily because of the many libraries that have been developed for it, and because of the number of books that document it. However, our experience teaching at Los Alamos was that Perl�s syntax is hard to learn, its behavior often arbitrary, and its size intimidating. While full-time professional programmers with several other languages under their belts might (and often do) say that it all makes sense once you know it, we want to make the learning curve as gentle as possible.
      3. Tcl
        Tcl is easier to learn and read than Perl, but is not as well documented, and doesn�t come with as many libraries. Had Python not existed, Tcl would probably have been chosen for this project.
      4. Python
        Python provides the same functionality as Perl or Tcl, but has proved to be easier to learn, read, and remember. (For example, words like "except" and "unless" appear much less often in Python reference material than they do in Perl reference material.) Python is not yet as extensively documented as Perl, but the number of books is growing, as is the number of modules and libraries. Finally, the Python community is still small enough for a project like this one to attract the attention of a significant proportion of it.
  6. How will development be organized and coordinated?
    Everything the project produces---designs, critiques of those designs, test suites, and examples, as well as actual source code---will be available through the project�s Web site at software-carpentry.codesourcery.com. Each project will have a coordinator, whose job it will be to moderate discussion, synchronize releases, track work items, and report on progress. The coordinator will also be responsible for collating and editing feedback from judges during the design competition.

Design competition

  1. Why a design competition?
    Most Open Source packages have their roots in someone�s pet hobby project, which others have picked up, extended, and modified. This kind of organic growth has a lot of good features, but a well-documented design is not one of them. As a result, programmers often have to rely on folklore and reverse engineering if they want to add to, or fix, these tools. In addition, there is a dearth of examples of good design for new programmers to learn from.
    The Software Carpentry project hopes to address both problems by running a two-stage design competition. The best entries in both rounds will be published, along with commentary from the competition�s judges. This material will serve both to inform and guide further development, and to show novices what experienced programmers think about before they start coding.
  2. Who can enter?
    Everyone: individuals and teams, students and professionals, from anywhere in the world.
  3. What are the rules?
    The full rules are available at:
    software-carpentry.codesourcery.com/design-competition/rules.html
    Basically, initial submissions must be written in English, and can be up to 10 pages long. Examples count against this limit, but diagrams and a Unix-style man page do not. Any person or team may submit only one entry in any given category, but can submit in as many of the four categories as desired.
    The best four entries in each category will be awarded US$2500, and asked to submit full designs. Participants will be strongly encouraged to pool their efforts for the second round. The best second-round submission will be awarded an additional US$7500, while the others will receive another US$2500 each. The real reward will be seeing the design implemented, and being in a good position to bid on the implementation work.
  4. What should first-round submissions contain?
    An example of what a submission should contain, and how it should be formatted is available at:
    software-carpentry.codesourcery.com/design-competition/example.html
    First-round entries should focus primarily on what the tool will do, and how it will be used: command-line options, input and output file formats, sketches of Web and GUI interfaces (where appropriate), and so on. Second-round submissions will then be expected to describe how it�s all going to be implemented.
  5. Who will the judges be?
    Need to firm up the list of judges ASAP.
  6. When are the deadlines?
    The deadline for first-round submissions is March 31, 2000. The five best proposals in each category will be announced on April 30, 2000. Full submissions are due on June 1, 2000, and winners will be announced on June 30, 2000.
  7. Won't prizes discourage co-operation?
    We don�t know. On the one hand, people might want to hoard their best ideas; on the other hand, the best designs in both rounds are going to be published, along with the judges� commentary, and we will be encouraging participants to pool their efforts. Most of the money that will be paid out will go to fund implementation, testing, and documentation; we hope that people will collaborate in the early stages, and treat the prizes as recognition for their effort, rather than treating US$10,000 as their retirement fund.

Documentation

  1. What documentation will be produced?
    The Software Carpentry project will produce several different kinds of documentation:
    1. Design documentation.
      As stated above, the best designs in each category will be published, along with the judges� commentary. This material ought to play the role that music criticism has played in the development of music, by giving newcomers (and experienced programmers) better insight into how good designers think.
    2. User guides.
      The project will pay for the development of man pages, user guides, online help, and all the other documentation needed to turn a program into a product.
    3. Test suites.
      The project will also pay for the development of industrial-strength test suites for all four tools. These suites will be published, both to serve as a starting point for other projects and to demonstrate good practice.
    4. Case studies.
      It is often easier to show someone how to do something than to explain it to them. The Software Carpentry project will pay for case studies that describe how these tools, and (more importantly) the working practices they support, have been deployed in practice. Checklists, templates for forms, and other errata can be submitted.
  2. What format(s) will be used?
    The primary format for all documentation will be HTML. The project will migrate to XML when and as feasible.
  3. What restrictions are there on using the documentation?
    Only those that also apply to the software, under the terms of its Open Source license. You can copy and distribute the documentation in any form, but only if its author(s) and origin are clearly shown, and if you include a description of how readers can access the originals. In particular, the documentation can be reproduced in books, but only if the authors, origin, and location of the originals is printed clearly on each page.