[Python-Dev] Official citation for Python

Jeremy Hylton jeremy at alum.mit.edu
Mon Sep 17 00:05:30 EDT 2018

I wanted to start with an easy answer that is surely unsatisfying:

APA style is pretty popular, and it says that standard software doesn't
need to be specified. Standard software includes "Microsoft Word, Java, and
Adobe Photoshop." So I'd say Python fits well in that category, and doesn't
need to be cited.

I said you wouldn't be satisfied...

On Sat, Sep 15, 2018 at 11:02 AM Jacqueline Kazil <jackiekazil at gmail.com>

> I just got caught up on the thread. This is a really great discussion.
> Thank you for all the contributions.
> Before we get into the details, let's go back to the main use case we are
> trying to solve.
> *As a user, I am writing an academic paper and I need to cite Python. *

The goal here is ambiguous. Python means many things--a language described
by the language specification, the source code of a particular
implementation of the language (Python often refers to C Python), a
particular binary release of the implementation of the language (Python
1.5.2 for Windows). Which one is relevant in the context of the paper? If
you're talking about a bug in timsort in a particular version of C Python,
then you probably want to cite that specific version of the implementation.

I suspect the most common goal for a citation is just to describe the
language "in general" where 1.5.2 or 3.7.0 and Jython or CPython are all
details that don't matter. In that case I'd cite the language
specification. We're talking about putting a citation in a paper (a written
source) and the written language specification captures what we think of as
essential for the language. If you want to cite Turing's proof of the
undecidability of the halting problem, you'd cite the paper where he wrote
it down (in Proceedings of the London Mathematical Society). If you want to
cite a programming language in the abstract, cite the specification that
describes it.

I think style guides are relevant here. They give guidance on how to cite
an item based on its category. For example, the MLA style guide describes
how to cite a digital file, a physical object, and many other things. My
favorite example under "physical object" is "Physical objects found
online." (Think about it :-).

There's some discussion of how to cite source code here:
http://integrity.mit.edu/handbook/writing-code. Notably this is talking
about citing source code in the context of other source code, and it mostly
recommends using URLs. If you wanted to cite a particular piece of source
code in an written article, you'd probably follow one of the approaches for
citing online resources. Try to identify who / when / what / where. For
example MLA style for a blog post would be : Editor, screen name, author,
or compiler name (if available). “Posting Title.” Name of Site, Version
number (if available), Name of institution/organization affiliated with the
site (sponsor or publisher), URL. Date of access. You could cite a
particular source file this way or a particular source release.

The date usually refers to the original publication date. I think that was
with the 1.0 release, although I'm not sure. I'd probably pick that date,
but someone can correct me if there's an earlier date. It would suggest
somehow that current Python and the original Python were mostly the same
thing, which is an idea I like.

van Rossum, Guido (1994). "The Python Language Reference". Python Software
Foundation, https://docs.python.org/reference/index.html. Retrieved 16
September 2018.

I'd say that's all settled. If anyone asks you, "How can you be sure that
settles it?" You can answer, "Some guy said it on a mailing list." And then
you can site the message:

Jeremy Hylton. "[Python-Dev] Official citation for Python." Sep. 17, 2018.
python-dev, https://mail.python.org/mailman/listinfo/python-dev. Accessed
18 September 2018.


> Let's throw reproducibility out the window for now (<--- something I never
> thought I would say), because that should be captured in the code, not in
> the citations.
> So, if we don't need the specific version of Python, then maybe creating
> one citation is all we need.
> And that gives it some good Google juice as well.
> Thoughts?
> (Once we nail down one or many, I think we can then move into the details
> of the content of the citation.)
> -Jackie
> On Thu, Sep 13, 2018 at 12:47 AM Wes Turner <wes.turner at gmail.com> wrote:
>> There was a thread about adding __cite__ to things and a tool to collect
>> those citations awhile back.
>> "[Python-ideas] Add a __cite__ method for scientific packages"
>> http://markmail.org/thread/rekmbmh64qxwcind
>> Which CPython source file should contain this __cite__ value?
>> ... On a related note, you should ask the list admin to append a URL to
>> each mailing list message whenever this list is upgraded to mm3; so that
>> you can all be appropriately cited.
>> On Thursday, September 13, 2018, Wes Turner <wes.turner at gmail.com> wrote:
>>> Do you guys think we should all cite Grub and BusyBox and bash and libc
>>> and setuptools and pip and openssl and GNU/Linux and LXC and Docker; or
>>> else it's plagiarism for us all?
>>> #OpenAccess
>>> On Wednesday, September 12, 2018, Stephen J. Turnbull <
>>> turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
>>>> Chris Barker via Python-Dev writes:
>>>>  > But "I wrote some code in Python to produce these statistics" --
>>>>  > does that need a citation?
>>>> That depends on what you mean by "statistics" and whether (as one
>>>> should) one makes the code available.  If the code is published or
>>>> "available on request", definitely, Python should be cited.  If not,
>>>> and by "statistics" you mean the kind of things provided by Steven
>>>> d'Aprano's excellent statistics module (mean, median, standard
>>>> deviation, etc), maybe no citation is needed.  But anything more
>>>> esoteric than that (even linear regression), yeah, I would say you
>>>> should cite both Python and any reference you used to learn the
>>>> algorithm or formulas, in the context of mentioning that your
>>>> statistics are home-brew, not produced by one of the recognized
>>>> applications for doing so.
>>>>  > If so, maybe that would take a different form.
>>>> Yes, it would.  But not so different: eg, version is analogous to
>>>> edition when citing a book.
>>>>  > Anyway, hard to make this decision without some idea how the
>>>>  > citation is intended to be used.
>>>> Same as any other citation, (1) to give credit to those responsible
>>>> for providing a resource (this is why publishers and their metadata of
>>>> city are still conventionally included), and (2) to show where that
>>>> resource can be obtained.  AFAICS, both motivations are universally
>>>> applicable in polite society.  NB: Replication is an important reason
>>>> for wanting to acquire the resource, but it's not the only one.
>>>> I think underlying your comment is the question of *what* resource is
>>>> being cited.  I can think of three offhand that might be characterized
>>>> as "Python".  First, the PSF, as a provider of funding.  There is a
>>>> conventional form for this: a footnote on the title or author's name
>>>> saying "The author acknowledges [a] <purpose of grant such as travel>
>>>> grant [grant identifier if available] from the Python Software
>>>> Foundation."  I usually orally mention them in presentations, too.
>>>> That one's easy; *everybody* should *always* do that.
>>>> The rest of these, sort of an ideal to strive for.  If you keep a
>>>> bibliographic database, and there are now quite a few efforts to crowd
>>>> source them, it's easier to go the whole 9 yards than to skimp.  But
>>>> except in cases where we don't need to even mention the code, probably
>>>> we should be citing, for reasons of courtesy to readers as well as
>>>> authors, editors, and publishers (as disgusting as many publishers are
>>>> as members of society, they do play a role in providing many resources
>>>> ---we should find ways to compete them into good behavior, not
>>>> ostracize them).
>>>> The second is the Python *language and standard library*.  Then the
>>>> Language Reference and/or the Library Reference should be cited
>>>> briefly when Python is first mentioned, and in the text introducing a
>>>> program or program fragment, with a full citation in the bibliography.
>>>> I tentatively suggest that the metadata for the Language Reference
>>>> would be
>>>>     Author: principal author(s) (Guido?) et al. OR python.org OR
>>>>         Python Contributors
>>>>     Title: The Python Language Reference
>>>>     Version: to match Python version used (if relevant, different
>>>>         versions each get full citations), probably should not be
>>>>         "current"
>>>>     Publisher: Python Software Foundation
>>>>     Date: of the relevant version
>>>>     Location: City of legal address of PSF
>>>>     URL: to version used (probably should not be the default)
>>>>     Date accessed: if "current" was used
>>>> The Library reference would be the same except for Title.
>>>> The third is a *particular implementation*.  In that case the metadata
>>>> would be
>>>>     Author: principal author(s) (Guido) et al. OR python.org OR
>>>>         Python Contributors
>>>>     Title: The cPython Python distribution
>>>>     Python Version: as appropriate (if relevant, different versions each
>>>>         get full citations), never "current"
>>>>     Distributor Version: if different from Python version (eg,
>>>> additional
>>>>         Debian cruft)
>>>>     Publisher: Distributor (eg, PSF, Debian Project, Anaconda Inc.)
>>>>     Date: of the relevant version
>>>>     Location: City of legal address of distributor
>>>> If downloaded:
>>>>     URL: to version used (including git commit SHA1 if available)
>>>>     Date accessed: download from distributor, not installation date
>>>> If received on physical medium: use the "usual" form of citation for a
>>>> collection of individual works (even if Python was the only thing on
>>>> it).  Probably the only additional information needed would be the
>>>> distributor as editor of the collection and the name of the
>>>> collection.
>>>> In most cases I can think of, if the implementation is cited, the
>>>> Language and Library References should be cited, too.
>>>> Finally, if Python or components were modified for the project, the
>>>> modified version should be preserved in a repository and a VCS
>>>> identifier provided.  This does not imply the repository need be
>>>> publicly accessible, of course, although it might be for other reasons
>>>> (eg, in a GSoC project,wherever or if hosted for free on GitHub).
>>>> I doubt that "URNs" like DOI and ISBN are applicable, but if available
>>>> they should be included in all cases as well.
>>>> Steve
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe:
>>>> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com
>>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/jackiekazil%40gmail.com
> --
> Jacqueline Kazil | @jackiekazil
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180917/d843bba9/attachment-0001.html>

More information about the Python-Dev mailing list