[Python-Dev] Official citation for Python
paul at ganssle.io
Sun Sep 16 19:35:12 EDT 2018
I think the "why" in this case should be a bit deeper than that, because
until recently, it's been somewhat unusual to cite the /tools you use/
to create a paper.
I see three major reasons why people cite software packages, and the
form of the citation would have different requirements for each one:
1. *Academic credit / Academic use metrics*
The weird way that academia has evolved, academics are largely judged by
their publications and how influential those publications are. A lot of
the people who work on statistical and scientific python libraries are
doing excellent and incredibly influential work, but that's largely
invisible to the metrics used by funding and tenure committees, so
there's been an effort do things like getting DOIs for libraries or
publishing articles in journals like the journal of open source
Then you cite the libraries if you use them, and the people who
contribute to the work can say, "Look I'm a regular contributor to this
core library that is cited in 90% of papers". This seems less important
to CPython, where the majority of core contributors (as far as I can
tell) are not academics and have little use for high h-index papers.
That said, even if no one involved cares about the academic credit, if
every paper that used Python cited the language, it probably /would/
provide useful metrics to the PSF and others interested in this.
If all you want is a formal way to say "I used Python for this" as a
citation so that it can be tracked, then a single DOI for the entire
language should be sufficient.
2. *As a primary source or example for some claims
If you are writing an article about language design and you are
referencing how Python handles async or scoping or unicode or something,
you want to make it easy for your readers to see the context of your
statement, to verify that it's true and to get more details than you
might want to include as part of what may be a tangential mention in
your paper. I have a sense that this is closer to the original reason
people cited things in papers and books before citations became a metric
for measuring influence - and subsequently a way to give credit for the
source of ideas.
If this is why you are citing Python, you should probably be citing a
specific sub-section of the language reference and/or documentation, and
that citation should probably be versioned, since new features are added
in every minor version, and the way some of these things are handled may
change over time. In this case, a separate DOI for each minor version
that points to the documentation as built by a specific commit or git
tag or whatever would probably be ideal.
3. *To aid reproducibility*
It won't go all the way towards reproducing your research, but given
that Python is a living language that is always changing - both in
implementation and the spec itself - to the extent that you have a
"methods" section, it should probably include things like operating
system version, CPython version and the versions of all libraries you
used so that if someone is failing to replicate your results, they know
how to build an environment where it /should work/.
If you want to include this information in the form of a citation, then
I would think that you would not want to be both more granular - citing
the specific interpreter you used (CPython, Jython, Pypy), the full
version (3.6.6 rather than 3.6) and possibly even other factors like
operating system, etc, and /less/ granular in that you don't need to
cite a specific subset of the interpreter (e.g. async), but just the
interpreter as a whole.
My thoughts on the matter are that I think the CPython core dev team
probably cares a lot less about #1 than, say, the R dev team, which is
one reason why there's no clear way to cite "CPython" as a whole.
I think that #3 is a very laudable goal, but probably should be in some
sort of "methods" section of the document being prepared rather than
overloading citations for it, though having a standardized way to
describe your Python setup (similar to, say, the pandas debugging
feature `pandas.show_versions()`) that is optimized for publication
would probably be super helpful.
While #2 is probably only a small fraction of all the times where people
would want to "cite CPython", I think it's probably the most important
one, since it's performing a very specific function useful to the reader
of the paper. It also seems not terribly difficult to come up with some
guidance for unambiguously referencing sections of the documentation
and/or language reference, and having "get a DOI for the documentation"
be part of the release cycle.
P.S. I will also be at the NumFocus summit. It's been some time since
I've been an academic, but hopefully there will be an interesting
discussion about this there!
On 9/16/18 6:22 PM, Jacqueline Kazil wrote:
> RE: Why cite Python….
> I would say that in this paper —
> where we introduced a new library, we should have cited Python,
> because the library was based in Python. We were riding on the
> coattails of Python and if Python did not exist, then this library
> would not exist.
> (taking this a level higher)
> Just as someone doing research (a specific application) should cite
> the Mesa library. Without the good and bad that is Mesa, their
> research would have taken a different form.
> Since my Ph.D is on Mesa, I will be citing Python there.
> I think for more insight we can look at who has cited some of Guido’s
> For example:
> Does that help?
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: OpenPGP digital signature
More information about the Python-Dev