[Python-ideas] Add a __cite__ method for scientific packages

Nathaniel Smith njs at pobox.com
Wed Jun 27 20:19:35 EDT 2018


On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
<andrei.kucharavy at gmail.com> wrote:
> To remediate to that situation, I suggest a __citation__ method associated
> to each package installation and import. Called from the __main__,
> __citation__() would scan __citation__ of all imported packages and return
> the list of all relevant top-level citations associated to the packages.
>
> As a scientific package developer working in academia, the problem is quite
> serious, and the solution seems relatively straightforward.
>
> What does Python core team think about addition and long-term maintenance of
> such a feature to the import and setup mechanisms? What do other users and
> scientific package developers think of such a mechanism for citations
> retrieval?

This is indeed a serious problem. I suspect python-ideas isn't the
best venue for addressing it though – there's nothing here that needs
changes to the Python interpreter itself (I think), and the people who
understand this problem the best and who are most affected by it,
mostly aren't here.

You'll want to check out the duecredit project:
https://github.com/duecredit/duecredit
One of the things they've thought about is the ability to track
citation information at a more fine-grained way than per-package – for
example, there might be a paper that should be cited by anyone who
calls a particular method (or even passes a specific argument to some
specific method, when that turns on some fancy algorithm).

The R world also has some prior art -- in particular I know they have
citations as part of the standard metadata in every package.

I'd actually like to see a more general solution that isn't restricted
to any one language, because multi-language analysis pipelines are
very common. For example, we could standardize a convention where if a
certain environment variable is set, then the software writes out
citation information to a certain location, and then implement
libraries that do this in multiple languages. Of course, that's a
"dynamic" solution that requires running the software -- which is
probably necessary if you want to do fine-grained citations, but it
might be useful to also have static metadata, e.g. as part of the
package metadata that goes into sdists, wheels, and on PyPI. That
would be a discussion for the distutils-sig mailing list, which
manages that metadata.

One challenge in standardizing this kind of thing is choosing a
standard way to represent citation information. Maybe CSL-JSON?
There's a lot of complexity as you dig into this, though of course one
shouldn't let the perfect be the enemy of the good...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-ideas mailing list