Add citation() to site.py

The standard site.py module adds pseudo-builtins: copyright credits exit help license quit I suggest one more: citation(). Python is being used heavily in scientific and academic fields, where it is often the convention to provide citations and references to the software used. The question of how to cite Python comes up from time to time, e.g.: https://mail.python.org/pipermail/tutor/2016-March/108460.html http://academia.stackexchange.com/questions/5482/how-do-i-reference-the-pyth... http://www.gossamer-threads.com/lists/python/python/105846 http://grokbase.com/t/python/python-list/04684fggwd/citing-python I think that having a standard answer available for this question is good practice, and will help strength Python's position as a good scientific language. SciPy, SymPy and iPython all document how they prefer to be cited: https://scipy.org/citing.html https://github.com/sympy/sympy#citation http://ipython.org/citing.html As do other languages such as Mathematica: http://support.wolfram.com/kb/472 But I think it would be a good idea to emulate the R language, which provides a function that gives the prefered citation. At the R prompt:
citation()
To cite R in publications use: R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. A BibTeX entry for LaTeX users is @Manual{, title = {R: A Language and Environment for Statistical Computing}, author = {{R Core Team}}, organization = {R Foundation for Statistical Computing}, address = {Vienna, Austria}, year = {2014}, url = {http://www.R-project.org/}, } We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also ‘citation("pkgname")’ for citing R packages. * * * If you provide a package name such as "splines", the R's output is the same except that the first line starts with: "The ‘splines’ package is part of R." (I don't have any third-party packages installed to test, but presumably they will offer customised output.) My suggestion is that we follow R's lead and add a citation() function to site.py which gives a suggested, standard, citation string that people can copy. Obviously we cannot expect to match all standard formats, but we can provide one and folks can convert to the format their university of journal requires. (Converting between academic reference styles is out of scope of this proposal.) The initial implementation might look something like this: year = platform.python_build()[1].split()[2] vers = platform.python_version() _CITE = """\ To cite Python in publications, please use: Python Core Team ({}). Python {}: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. """.format(year, vers) def citation(): print(_CITE) Or perhaps it should use the same implementation as copyright, credits and license. Thoughts? -- Steve

On Mar 18 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced? If so, adding this would definitely help (and would certainly generate some citations from me). If not, there seems little point. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 3/17/2016 12:32 PM, Nikolaus Rath wrote:
Excellent idea, though I suggest 'citation' instead. The current splash line is Type "copyright", "credits" or "license()" for more information. The difference between no () and () required is a matter of length, and a 3 or 4 line citation fits the no () case.
credits Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
Copyright (c) 2000 BeOpen.com. All Rights Reserved. Copyright (c) 1995-2001 Corporation for National Research Initiatives. All Rights Reserved. Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam. All Rights Reserved. thousands for supporting Python development. See www.python.org for more information.
license Type license() to see the full license text
The full license text is several pages, run through a pager. (quit and exit also have custom representations.) In addition, the front page of the docs should have a new item Citation under 'Meta Information'. The linked page could have multiple formats.
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced?
This one says YES!. Citing software used in a project is standard, but often a nuisance to get right.
If so, adding this would definitely help (and would certainly generate some citations from me).
-- Terry Jan Reedy

I'd be most interested in Guido's opinion, but personally I think Python is already well known enough to be a "household name" in any publication that might include a citation. Nobody cites C++ or Java, to my knowledge, and I'd rather be in that category than the other. Top-posted from my Windows Phone -----Original Message----- From: "Nikolaus Rath" <Nikolaus@rath.org> Sent: 3/17/2016 9:34 To: "python-ideas@python.org" <python-ideas@python.org> Subject: Re: [Python-ideas] Add citation() to site.py On Mar 18 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced? If so, adding this would definitely help (and would certainly generate some citations from me). If not, there seems little point. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Mar 17, 2016 1:21 PM, "Steve Dower" <steve.dower@python.org> wrote:
I'd be most interested in Guido's opinion, but personally I think Python
is already well known enough to be a "household name" in any publication that might include a citation. Nobody cites C++ or Java, to my knowledge, and I'd rather be in that category than the other. Norms are somewhat in flux here -- there's a lot of discussion going on in academic circles right now about how to appropriately credit software. Traditionally the answer has been that software mostly doesn't get credited at all, which has predictably pernicious effects... One way to think about it is that providing a preferred citation is useful for folks who want to cite, and folks who don't want to cite will ignore it. -n

On Fri, Mar 18, 2016 at 7:35 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
I would also think that citing the software increases the ease of duplicating the results.
If that's the intention, citation() should include enough details to ensure this. There's been talk of changing the default PRNG, so that kind of thing would have to be identifiable (eg if the citation says "CPython 3.6.1" and it's known that the PRNG changed in 3.6, people will be able to understand why the results differ on 3.5). Probably sys.version will have enough info for that. ChrisA

On 03/17/2016 01:41 PM, Chris Angelico wrote:
On Fri, Mar 18, 2016 at 7:35 AM, Ethan Furman wrote:
If the citation doesn't have enough information to enable somebody to go and get the same software package it seems rather pointless. As one example other goodies bundled with a distribution will vary betwixt them. -- ~Ethan~

On Mar 17 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
Really? What kind of citation that you have in mind that would help more than "...we used a Python program to do [bla]..." somewhere in the paper? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 17Mar2016 1335, Ethan Furman wrote:
Afraid not. Posting a requirements.txt or a conda spec file would be far more valuable for this, or these days sharing a Jupyter Notebook with the code and dependency-spec embedded. It's also arguable as to how valuable reproducing the results in an identical environment actually is. Yes, your code runs on a different machine, but if your "research" is code then you are a developer, not a scientist. Someone actually needs to implement the idea in a different environment (and ideally they know just how it differs) to demonstrate that it is an actual result and not just a fluke. Cheers, Steve

I replied from my cell phone but it bounced (again :-( ). I wrote: """ I think this should be in the docs, not in the code. (The reason to have copyright() in the code is purely legalistic.) """ But I would be much more interested in hearing what the citation text should be before discussing how to add it to the code. (I still think adding it to the code isn't very helpful -- we can't add this to the older versions of code that people are likely to be using.) On Thu, Mar 17, 2016 at 2:38 PM, Steve Dower <steve.dower@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thu, Mar 17, 2016 at 02:51:37PM -0700, Guido van Rossum wrote:
If this is in the code, a possible future enhancement would be for it to take an optional module or module name and query that module for its preferred citation. That's what R does.
True, but some day -- hopefully not that far away -- 3.6 will be the "older version" that people are likely to be using. My suggested citation format is based heavily on that used by R. It could be implemented something like this: year = platform.python_build()[1].split()[2] vers = platform.python_version() _CITE = """\ To cite Python in publications, please use: Python Core Team ({}). Python {}: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. """.format(year, vers) which would end up looking like this: To cite Python in publications, please use: Python Core Team (2015). Python 3.6.0a0: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. I'm not married to any particular wording or preferred format. -- Steve

Well, I entirely fail to understand why that belongs in the code rather than in the docs. "Because R does it" seems a pretty poor reason. But then again I've never cared about citations. On Thu, Mar 17, 2016 at 4:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

On 3/17/2016 7:45 PM, Guido van Rossum wrote:
But then again I've never cared about citations.
Citations are not just in Bibliographies. PEPs have them in footnotes, which was a traditional place for citations, though in the modern form of hyperlinks. The tracker is full of in-line citations, also in the form of hyperlinks. The formalized citation proposed by Stephen would include the URL. -- Terry Jan Reedy

On Fri, Mar 18, 2016 at 08:09:11AM -0600, Gyro Funch wrote:
Could 'citation' be an optional entry in setup.py instead?
That matches my proposal. I'm not suggesting it should be a built-in built-in, it can be added by site.py like copyright, license, help, quit etc. If Guido has really strong objections to this, I'd be satisfied with it being a FAQ entry. But I think an advantage of code over a static FAQ is that the code can automatically fill in the version and year. -- Steve

On 19 March 2016 at 00:09, Gyro Funch <gyromagnetic@gmail.com> wrote:
I'd actually suggest that programmatic generation of citations for packages distributed through PyPI would be better handled as a third party project. Software developers that aren't themselves also research scientists aren't going to care to provide their own citation details, so it's best to establish a convention that doesn't rely on software publishers doing anything they aren't already doing. In the specific case of CPython, rather than asking Guido and the rest of the core development team "How do you want to be cited?", it probably makes more sense to ask "We're proposing to standardise on citing CPython this way, are you OK with that?" (e.g. by proposing a patch for a new "Citation" link in the meta-information on the docs home page at https://docs.python.org/3/ ). The reason I suggest that approach is that most (all?) of us aren't research scientists, so we have no idea what typical conventions are for citations, nor how those conventions are changing. However, the docs are much easier to amend than the standard library, while still being versioned along with the rest of the software, so adding citation information there is a low risk way of answering the question if folks do want to cite us. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mar 19 2016, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Which I believe makes it completely pointless to cite Python at all. As far as I can see, nowadays citations are given for two reasons: 1. To give the reader a starting point to get more information on a topic. 2. To formally acknowledge the work done by someone else (who ends up with an increased number of citations for the cited publication, which is unfortunately a crucial metric in most academic hiring and evaluation processes). In case of Python, an explicit citation thus adds nothing. Relevant information is easily found by any search engine, and if Nick is right than no one stands to gain anything from the citations either. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Sun, Mar 20, 2016 at 01:26:03PM -0700, Nikolaus Rath wrote:
I'm afraid I don't understand your reasoning here. Both of your two reasons apply: a citation to Python gives the reader a starting point to get more information, and it formally acknowledges the work done by others. So a citation adds exactly the two things that you say citations are used for. You might feel that everybody knows how to use google, and that a formal acknowledgement is pointless because nobody cares, but that's a value judgement about the usefulness of what the citation adds, not whether it adds them or not. Useful or not, scientific papers do cite the software they use. This proposal is about making it easier for people to do so, not about changing their behaviour. For the record, I think citations of software are useful, but other arguments have convinced me that for the time being at least, this is best handled as documentation rather than code. More on that shortly. -- Steve

On Mar 21 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
As I said, I don't think a formal reference to something like [1] Python Core Team (2015). Python 3.6.0a0: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. gives the reader a better starting point than just writing "...a Python script was used to...".
and it formally acknowledges the work done by others.
Yeah, but the "others" don't benefit from it and don't care about it.
Indeed. I judge the value of the extra information to be less than the value of the space consumed by it, so I consider it pointless. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Wed, Mar 23, 2016 at 12:31 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
What's the basis for this assertion? Even if you do a survey of all present and past contributors and they all agree with you, it does not mean that increased visibility in academic publications will not attract future contributors who would value proper citations. Personally, I am +0 on the idea.

On Mar 23 2016, Alexander Belopolsky <alexander.belopolsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
The responses on this list so far.
Even if you do a survey of all present and past contributors [...]
I think it's best to work with the data that's available. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Mar 23 2016, Alexander Belopolsky <alexander.belopolsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
I understood them as "I'd like to have citation information as a builtin", not "I would benefit from Python being cited". Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Thu, Mar 24, 2016 at 09:49:50AM -0700, Nikolaus Rath wrote:
This argument is a waste of everybody's time. Regardless of whether you, or anyone else, thinks that Python needs to be cited, or whether we get any benefit from that, *people will cite it no matter what we think*. If you read my original post in this thread, I link to examples of people asking how to cite Python, because they intend to cite it. When a scientist who has written a paper gets told by the journal editors to cite the software used, then regardless of what you or I think about the matter, the author will cite. If somebody doing their thesis is told by their supervisor to cite, they will cite. What you and I think about this is irrelevant. All we can do is help them, or not, according to whether we feel like being helpful or not. My vote is to be helpful, but I am satisfied that a "Citation" page in the documentation is enough for now. -- Steve

Nikolaus Rath writes:
While Steven gave a good answer, I'd like to provide a social science researcher's slant. Python gets a mention. As the politicians say, "Use all the epithets you like, but spell my name right!" Believe it or not, Python is *not* a "household word" in academic business and economics fields. A lot of us are teaching Python by preference, but there's still a large overhang of (mostly, but not invariably, older) researchers who only know Java, C/C++, or FORTRAN[sic] as "scientific" programming languages. To researchers, a citation is a pointer to an authoritative source, and specifically authoritative in specifying the *cited reference*, not current versions or random "hits". To those who don't know anything about Python, the fact that there is an authoritative citation gives them some confidence that Python itself is an ongoing entity. OTOH, web searchs are not going to give you authoritative sources. The reasons for changing citation practice are not changes in these social relationships, but rather (1) changing publication channels requires changes of "pointers", and (in the case of many web pages which are generated dynamically) a more precise date of publication (ie, the date viewed) (2) some pointers are more efficient and accurate than others, and they are being introduced as alternatives (often preferred) to the traditional ones.

On Thu, Mar 17, 2016 at 4:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I think it would be premature for the stdlib to try to standardize machine readable citation metadata for third-party packages, or even a general API for accessing them. There are a lot of complex issues in this space that are still being explored by third-party packages like duecredit: https://github.com/duecredit/duecredit (Notice that the citation() function in R actually does some rather complicated things and returns a rather complicated object: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/citation.html https://stat.ethz.ch/R-manual/R-devel/library/utils/html/bibentry.html ) -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Mar 17, 2016 at 04:54:41PM -0700, Nathaniel Smith wrote:
Thanks for the link, although that is far and beyond anything that I'm suggesting.
Given this, and comments from others (esp. Guido and Nick) I think it makes sense to start with documentation. One advantage of changing the docs is that it can apply to all versions, not just 3.6. I've raised a tracker item here: http://bugs.python.org/issue26597 Do we have consensus that this should be a separate page in the documentation, under "Meta Information"? The only other relevant place I think would be a FAQ, except I'm not sure that it is *quite* frequent enough :-) https://docs.python.org/3/index.html (scroll right to the bottom) Thanks for the feedback from everyone! -- Steve

On 3/17/2016 5:51 PM, Guido van Rossum wrote:
I think something on or accessible from the front doc page would be enough in most cases. The current copyright page is just 7 lines. A citation format could be added below and the link changed to Copyright and Citation Format. The people helped would include students who are supposed to add 'professional' citations in the bibliography of papers they write. This change could, of course, be backported. -- Terry Jan Reedy

On Thu, Mar 17, 2016 at 2:38 PM, Steve Dower <steve.dower@python.org> wrote:
Docker containers are also getting intense interest for this use case. It is true that citations can provide some help with reproduction (e.g. the cite might give you a hint whether the code attached to the paper was tested on py2 or py3), but yeah, cites are more about distributing credit and gathering metrics ("how many papers published last year used python?") than about reproducibility per se.
The complication here is that it turns out that if you follow this definition, then there are fewer and fewer scientists every year. Soon there will be none left :-). These questions around reproducibility/replicability are also extremely hot topics in science right now... not clear how it will all play out, but I think it's safe to say that there are a lot of scientists who think recording and communicating exact environments is very valuable. -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Mar 17, 2016 at 7:08 PM, Nathaniel Smith <njs@pobox.com> wrote:
"In the good old days physicists repeated each other's experiments, just to be sure. Today they stick to FORTRAN, so that they can share each other's programs, bugs included." -- Edsger W.Dijkstra, 18 June 1975

On Thu, Mar 17, 2016 at 07:31:40PM -0400, Alexander Belopolsky wrote:
I'm reminded of this quote from William H. Press, et al, "Numerical Recipes in Pascal": "If all scientific papers whose results are in doubt because of bad Randoms were to disappear from library shelves, there would be a gap on each shelf about as big as your fist." -- Steve

On 17Mar2016 1608, Nathaniel Smith wrote:
Nah, someone can just write a seminal paper that redefines science in terms of Python and we'll be good for another few decades :)
I certainly agree with this, but the point of communicating the exact environment is not always so that other can reproduce the environment (which includes the problem, parametisation, benchmarks, preprocessing and more as well as the software itself), but so they can compare environments. In some cases it could be used so the environment can be identical and a change to the work be compared instead, but I'd consider that "advancement" rather than "reproduction". But this is a tangent I'm not desperately keen to follow through on, so I'll leave it here. --- On the actual text of the citation, have the big "citation formats" (IEEE, APA, etc.) defined formats/metadata for versioned software products? I don't think they had last time I used them, but that was four years ago now. Cheers, Steve

On Mar 17, 2016, at 16:08, Nathaniel Smith <njs@pobox.com> wrote:
It's going to be so much fun trying to look back at early 21st century science from the future. "Up to about 2015, they were still publishing 'papers' written in the relatively simple 'postscript' language that we easily reverse engineered an interpreter for. Over the next few decades, they started publishing in more complicated formats, like entire virtual machines that were run by some software we no longer have access to that's only portable to operating systems that we have the source code for but don't know how to build. Successfully extracting some memory proteins from a not-entirely-decomposed body in what used to be Australia caused a bit of excitement last megasecond, but so far all we've been able to learn is how to construct a hat out of red felt. So for this segment of the class, we'll be looking almost entirely at whatever pop-science articles were recoverable from a backup of the Internet archive."

Nathaniel Smith writes:
Steve Dower wrote:
Actually, there is an ever-increasing number of wannabes who just go through the motions. At least among business academics, they are the ones most likely to cite (rather than publish in an "appendix available from the author") software (including programming languages like Python and application environments like R, as well as pure applications like SPSS). I suppose you might consider that an argument against citation(). ;-) Which is a partial answer to Guido: it turns out that in academic business statistics, there is substantial disagreement about appropriate models (especially in what is called "factor analysis" and "structural equation modeling"), which lead to differing calculations. As it happens, SPSS by default provides a particular factor analysis function that uses a now-deprecated model. You wouldn't know that without reading the SPSS manual. Other examples relevant to citing Python (with version info) include PRNGs (already mentioned), and validation depending on order of iteration of dicts, which are implementation dependent. this doesn't necessarily motivate for implementation in the interpreter, though.
I'm sure Dr. Obokata wishes she had been able to do so!

On Mar 18 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced? If so, adding this would definitely help (and would certainly generate some citations from me). If not, there seems little point. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 3/17/2016 12:32 PM, Nikolaus Rath wrote:
Excellent idea, though I suggest 'citation' instead. The current splash line is Type "copyright", "credits" or "license()" for more information. The difference between no () and () required is a matter of length, and a 3 or 4 line citation fits the no () case.
credits Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
Copyright (c) 2000 BeOpen.com. All Rights Reserved. Copyright (c) 1995-2001 Corporation for National Research Initiatives. All Rights Reserved. Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam. All Rights Reserved. thousands for supporting Python development. See www.python.org for more information.
license Type license() to see the full license text
The full license text is several pages, run through a pager. (quit and exit also have custom representations.) In addition, the front page of the docs should have a new item Citation under 'Meta Information'. The linked page could have multiple formats.
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced?
This one says YES!. Citing software used in a project is standard, but often a nuisance to get right.
If so, adding this would definitely help (and would certainly generate some citations from me).
-- Terry Jan Reedy

I'd be most interested in Guido's opinion, but personally I think Python is already well known enough to be a "household name" in any publication that might include a citation. Nobody cites C++ or Java, to my knowledge, and I'd rather be in that category than the other. Top-posted from my Windows Phone -----Original Message----- From: "Nikolaus Rath" <Nikolaus@rath.org> Sent: 3/17/2016 9:34 To: "python-ideas@python.org" <python-ideas@python.org> Subject: Re: [Python-ideas] Add citation() to site.py On Mar 18 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
I think the core question is: are the (core?) Python contributors or the PSF interested in being referenced? If so, adding this would definitely help (and would certainly generate some citations from me). If not, there seems little point. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Mar 17, 2016 1:21 PM, "Steve Dower" <steve.dower@python.org> wrote:
I'd be most interested in Guido's opinion, but personally I think Python
is already well known enough to be a "household name" in any publication that might include a citation. Nobody cites C++ or Java, to my knowledge, and I'd rather be in that category than the other. Norms are somewhat in flux here -- there's a lot of discussion going on in academic circles right now about how to appropriately credit software. Traditionally the answer has been that software mostly doesn't get credited at all, which has predictably pernicious effects... One way to think about it is that providing a preferred citation is useful for folks who want to cite, and folks who don't want to cite will ignore it. -n

On Fri, Mar 18, 2016 at 7:35 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
I would also think that citing the software increases the ease of duplicating the results.
If that's the intention, citation() should include enough details to ensure this. There's been talk of changing the default PRNG, so that kind of thing would have to be identifiable (eg if the citation says "CPython 3.6.1" and it's known that the PRNG changed in 3.6, people will be able to understand why the results differ on 3.5). Probably sys.version will have enough info for that. ChrisA

On 03/17/2016 01:41 PM, Chris Angelico wrote:
On Fri, Mar 18, 2016 at 7:35 AM, Ethan Furman wrote:
If the citation doesn't have enough information to enable somebody to go and get the same software package it seems rather pointless. As one example other goodies bundled with a distribution will vary betwixt them. -- ~Ethan~

On Mar 17 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
Really? What kind of citation that you have in mind that would help more than "...we used a Python program to do [bla]..." somewhere in the paper? Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 17Mar2016 1335, Ethan Furman wrote:
Afraid not. Posting a requirements.txt or a conda spec file would be far more valuable for this, or these days sharing a Jupyter Notebook with the code and dependency-spec embedded. It's also arguable as to how valuable reproducing the results in an identical environment actually is. Yes, your code runs on a different machine, but if your "research" is code then you are a developer, not a scientist. Someone actually needs to implement the idea in a different environment (and ideally they know just how it differs) to demonstrate that it is an actual result and not just a fluke. Cheers, Steve

I replied from my cell phone but it bounced (again :-( ). I wrote: """ I think this should be in the docs, not in the code. (The reason to have copyright() in the code is purely legalistic.) """ But I would be much more interested in hearing what the citation text should be before discussing how to add it to the code. (I still think adding it to the code isn't very helpful -- we can't add this to the older versions of code that people are likely to be using.) On Thu, Mar 17, 2016 at 2:38 PM, Steve Dower <steve.dower@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thu, Mar 17, 2016 at 02:51:37PM -0700, Guido van Rossum wrote:
If this is in the code, a possible future enhancement would be for it to take an optional module or module name and query that module for its preferred citation. That's what R does.
True, but some day -- hopefully not that far away -- 3.6 will be the "older version" that people are likely to be using. My suggested citation format is based heavily on that used by R. It could be implemented something like this: year = platform.python_build()[1].split()[2] vers = platform.python_version() _CITE = """\ To cite Python in publications, please use: Python Core Team ({}). Python {}: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. """.format(year, vers) which would end up looking like this: To cite Python in publications, please use: Python Core Team (2015). Python 3.6.0a0: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. I'm not married to any particular wording or preferred format. -- Steve

Well, I entirely fail to understand why that belongs in the code rather than in the docs. "Because R does it" seems a pretty poor reason. But then again I've never cared about citations. On Thu, Mar 17, 2016 at 4:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

On 3/17/2016 7:45 PM, Guido van Rossum wrote:
But then again I've never cared about citations.
Citations are not just in Bibliographies. PEPs have them in footnotes, which was a traditional place for citations, though in the modern form of hyperlinks. The tracker is full of in-line citations, also in the form of hyperlinks. The formalized citation proposed by Stephen would include the URL. -- Terry Jan Reedy

On Fri, Mar 18, 2016 at 08:09:11AM -0600, Gyro Funch wrote:
Could 'citation' be an optional entry in setup.py instead?
That matches my proposal. I'm not suggesting it should be a built-in built-in, it can be added by site.py like copyright, license, help, quit etc. If Guido has really strong objections to this, I'd be satisfied with it being a FAQ entry. But I think an advantage of code over a static FAQ is that the code can automatically fill in the version and year. -- Steve

On 19 March 2016 at 00:09, Gyro Funch <gyromagnetic@gmail.com> wrote:
I'd actually suggest that programmatic generation of citations for packages distributed through PyPI would be better handled as a third party project. Software developers that aren't themselves also research scientists aren't going to care to provide their own citation details, so it's best to establish a convention that doesn't rely on software publishers doing anything they aren't already doing. In the specific case of CPython, rather than asking Guido and the rest of the core development team "How do you want to be cited?", it probably makes more sense to ask "We're proposing to standardise on citing CPython this way, are you OK with that?" (e.g. by proposing a patch for a new "Citation" link in the meta-information on the docs home page at https://docs.python.org/3/ ). The reason I suggest that approach is that most (all?) of us aren't research scientists, so we have no idea what typical conventions are for citations, nor how those conventions are changing. However, the docs are much easier to amend than the standard library, while still being versioned along with the rest of the software, so adding citation information there is a low risk way of answering the question if folks do want to cite us. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mar 19 2016, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Which I believe makes it completely pointless to cite Python at all. As far as I can see, nowadays citations are given for two reasons: 1. To give the reader a starting point to get more information on a topic. 2. To formally acknowledge the work done by someone else (who ends up with an increased number of citations for the cited publication, which is unfortunately a crucial metric in most academic hiring and evaluation processes). In case of Python, an explicit citation thus adds nothing. Relevant information is easily found by any search engine, and if Nick is right than no one stands to gain anything from the citations either. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Sun, Mar 20, 2016 at 01:26:03PM -0700, Nikolaus Rath wrote:
I'm afraid I don't understand your reasoning here. Both of your two reasons apply: a citation to Python gives the reader a starting point to get more information, and it formally acknowledges the work done by others. So a citation adds exactly the two things that you say citations are used for. You might feel that everybody knows how to use google, and that a formal acknowledgement is pointless because nobody cares, but that's a value judgement about the usefulness of what the citation adds, not whether it adds them or not. Useful or not, scientific papers do cite the software they use. This proposal is about making it easier for people to do so, not about changing their behaviour. For the record, I think citations of software are useful, but other arguments have convinced me that for the time being at least, this is best handled as documentation rather than code. More on that shortly. -- Steve

On Mar 21 2016, Steven D'Aprano <steve-iDnA/YwAAsAk+I/owrrOrA@public.gmane.org> wrote:
As I said, I don't think a formal reference to something like [1] Python Core Team (2015). Python 3.6.0a0: A dynamic, open source programming language. Python Software Foundation. URL https://www.python.org/. gives the reader a better starting point than just writing "...a Python script was used to...".
and it formally acknowledges the work done by others.
Yeah, but the "others" don't benefit from it and don't care about it.
Indeed. I judge the value of the extra information to be less than the value of the space consumed by it, so I consider it pointless. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Wed, Mar 23, 2016 at 12:31 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
What's the basis for this assertion? Even if you do a survey of all present and past contributors and they all agree with you, it does not mean that increased visibility in academic publications will not attract future contributors who would value proper citations. Personally, I am +0 on the idea.

On Mar 23 2016, Alexander Belopolsky <alexander.belopolsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
The responses on this list so far.
Even if you do a survey of all present and past contributors [...]
I think it's best to work with the data that's available. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Mar 23 2016, Alexander Belopolsky <alexander.belopolsky-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
I understood them as "I'd like to have citation information as a builtin", not "I would benefit from Python being cited". Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Thu, Mar 24, 2016 at 09:49:50AM -0700, Nikolaus Rath wrote:
This argument is a waste of everybody's time. Regardless of whether you, or anyone else, thinks that Python needs to be cited, or whether we get any benefit from that, *people will cite it no matter what we think*. If you read my original post in this thread, I link to examples of people asking how to cite Python, because they intend to cite it. When a scientist who has written a paper gets told by the journal editors to cite the software used, then regardless of what you or I think about the matter, the author will cite. If somebody doing their thesis is told by their supervisor to cite, they will cite. What you and I think about this is irrelevant. All we can do is help them, or not, according to whether we feel like being helpful or not. My vote is to be helpful, but I am satisfied that a "Citation" page in the documentation is enough for now. -- Steve

Nikolaus Rath writes:
While Steven gave a good answer, I'd like to provide a social science researcher's slant. Python gets a mention. As the politicians say, "Use all the epithets you like, but spell my name right!" Believe it or not, Python is *not* a "household word" in academic business and economics fields. A lot of us are teaching Python by preference, but there's still a large overhang of (mostly, but not invariably, older) researchers who only know Java, C/C++, or FORTRAN[sic] as "scientific" programming languages. To researchers, a citation is a pointer to an authoritative source, and specifically authoritative in specifying the *cited reference*, not current versions or random "hits". To those who don't know anything about Python, the fact that there is an authoritative citation gives them some confidence that Python itself is an ongoing entity. OTOH, web searchs are not going to give you authoritative sources. The reasons for changing citation practice are not changes in these social relationships, but rather (1) changing publication channels requires changes of "pointers", and (in the case of many web pages which are generated dynamically) a more precise date of publication (ie, the date viewed) (2) some pointers are more efficient and accurate than others, and they are being introduced as alternatives (often preferred) to the traditional ones.

On Thu, Mar 17, 2016 at 4:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I think it would be premature for the stdlib to try to standardize machine readable citation metadata for third-party packages, or even a general API for accessing them. There are a lot of complex issues in this space that are still being explored by third-party packages like duecredit: https://github.com/duecredit/duecredit (Notice that the citation() function in R actually does some rather complicated things and returns a rather complicated object: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/citation.html https://stat.ethz.ch/R-manual/R-devel/library/utils/html/bibentry.html ) -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Mar 17, 2016 at 04:54:41PM -0700, Nathaniel Smith wrote:
Thanks for the link, although that is far and beyond anything that I'm suggesting.
Given this, and comments from others (esp. Guido and Nick) I think it makes sense to start with documentation. One advantage of changing the docs is that it can apply to all versions, not just 3.6. I've raised a tracker item here: http://bugs.python.org/issue26597 Do we have consensus that this should be a separate page in the documentation, under "Meta Information"? The only other relevant place I think would be a FAQ, except I'm not sure that it is *quite* frequent enough :-) https://docs.python.org/3/index.html (scroll right to the bottom) Thanks for the feedback from everyone! -- Steve

On 3/17/2016 5:51 PM, Guido van Rossum wrote:
I think something on or accessible from the front doc page would be enough in most cases. The current copyright page is just 7 lines. A citation format could be added below and the link changed to Copyright and Citation Format. The people helped would include students who are supposed to add 'professional' citations in the bibliography of papers they write. This change could, of course, be backported. -- Terry Jan Reedy

On Thu, Mar 17, 2016 at 2:38 PM, Steve Dower <steve.dower@python.org> wrote:
Docker containers are also getting intense interest for this use case. It is true that citations can provide some help with reproduction (e.g. the cite might give you a hint whether the code attached to the paper was tested on py2 or py3), but yeah, cites are more about distributing credit and gathering metrics ("how many papers published last year used python?") than about reproducibility per se.
The complication here is that it turns out that if you follow this definition, then there are fewer and fewer scientists every year. Soon there will be none left :-). These questions around reproducibility/replicability are also extremely hot topics in science right now... not clear how it will all play out, but I think it's safe to say that there are a lot of scientists who think recording and communicating exact environments is very valuable. -n -- Nathaniel J. Smith -- https://vorpus.org

On Thu, Mar 17, 2016 at 7:08 PM, Nathaniel Smith <njs@pobox.com> wrote:
"In the good old days physicists repeated each other's experiments, just to be sure. Today they stick to FORTRAN, so that they can share each other's programs, bugs included." -- Edsger W.Dijkstra, 18 June 1975

On Thu, Mar 17, 2016 at 07:31:40PM -0400, Alexander Belopolsky wrote:
I'm reminded of this quote from William H. Press, et al, "Numerical Recipes in Pascal": "If all scientific papers whose results are in doubt because of bad Randoms were to disappear from library shelves, there would be a gap on each shelf about as big as your fist." -- Steve

On 17Mar2016 1608, Nathaniel Smith wrote:
Nah, someone can just write a seminal paper that redefines science in terms of Python and we'll be good for another few decades :)
I certainly agree with this, but the point of communicating the exact environment is not always so that other can reproduce the environment (which includes the problem, parametisation, benchmarks, preprocessing and more as well as the software itself), but so they can compare environments. In some cases it could be used so the environment can be identical and a change to the work be compared instead, but I'd consider that "advancement" rather than "reproduction". But this is a tangent I'm not desperately keen to follow through on, so I'll leave it here. --- On the actual text of the citation, have the big "citation formats" (IEEE, APA, etc.) defined formats/metadata for versioned software products? I don't think they had last time I used them, but that was four years ago now. Cheers, Steve

On Mar 17, 2016, at 16:08, Nathaniel Smith <njs@pobox.com> wrote:
It's going to be so much fun trying to look back at early 21st century science from the future. "Up to about 2015, they were still publishing 'papers' written in the relatively simple 'postscript' language that we easily reverse engineered an interpreter for. Over the next few decades, they started publishing in more complicated formats, like entire virtual machines that were run by some software we no longer have access to that's only portable to operating systems that we have the source code for but don't know how to build. Successfully extracting some memory proteins from a not-entirely-decomposed body in what used to be Australia caused a bit of excitement last megasecond, but so far all we've been able to learn is how to construct a hat out of red felt. So for this segment of the class, we'll be looking almost entirely at whatever pop-science articles were recoverable from a backup of the Internet archive."

Nathaniel Smith writes:
Steve Dower wrote:
Actually, there is an ever-increasing number of wannabes who just go through the motions. At least among business academics, they are the ones most likely to cite (rather than publish in an "appendix available from the author") software (including programming languages like Python and application environments like R, as well as pure applications like SPSS). I suppose you might consider that an argument against citation(). ;-) Which is a partial answer to Guido: it turns out that in academic business statistics, there is substantial disagreement about appropriate models (especially in what is called "factor analysis" and "structural equation modeling"), which lead to differing calculations. As it happens, SPSS by default provides a particular factor analysis function that uses a now-deprecated model. You wouldn't know that without reading the SPSS manual. Other examples relevant to citing Python (with version info) include PRNGs (already mentioned), and validation depending on order of iteration of dicts, which are implementation dependent. this doesn't necessarily motivate for implementation in the interpreter, though.
I'm sure Dr. Obokata wishes she had been able to do so!
participants (16)
-
Alexander Belopolsky
-
Alexander Walters
-
Andrew Barnert
-
Chris Angelico
-
David Mertz
-
Ethan Furman
-
Guido van Rossum
-
Gyro Funch
-
Nathaniel Smith
-
Nick Coghlan
-
Nikolaus Rath
-
Stephen J. Turnbull
-
Steve Dower
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy