[Python-ideas] Add a __cite__ method for scientific packages

Matt Arcidy marcidy at gmail.com
Fri Jun 29 21:58:20 EDT 2018


On Fri, Jun 29, 2018, 17:14 Andrei Kucharavy <andrei.kucharavy at gmail.com>
wrote:

> One more thing. There's precedent for this: when you start an interactive
>> Python interpreter it tells you how to get help, but also how to get
>> copyright, credits and license information:
>>
>> $ python3
>> Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54)
>> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> credits
>>     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
>> thousands
>>     for supporting Python development.  See www.python.org for more
>> information.
>> >>>
>>
>
This is thin justification to add something to core.

It seems like the very small percentage of academic users whose careers
depend on this cannot resolve the political issue of forming a standards
body.

I don't see how externalizing the standard development will help.  Kudos
for shortcutting the process in a practical way to just get it done,  but
this just puts core devs in the middle of silly academic spats.  A language
endorsed citation method isn't a 'correct' method, and without the broad
consensus that currently doesn't exist, this becomes _your_ method, a
picked winner but ultimately a lightning rod for bored tenured professors
with personal axes to grind.  If this were about implementing an existing
correct method I'm sure a grad student would be tasked with it for an
afternoon.

This is insanely easy to implement in docstrings, or a standard import, or
mandatory include, or decorator, or anywhere else, it's just a parsing
protocol.  I believe 3.7 now exposes docstrings in the AST, meaning a
simple static analyzer can handle all of PyPi, giving you crazy granularity
if citations existed.  Don't you want to cite the exact algorithm used in
an imported method, not just lump them all into one call?  Heck, I bet you
could use type annotations.

This really feels like you've got an amazing multi-tool but you want to
turn the world, not the screw. This isn't a tool the majority of people
will use, even if the citations exist.

Don't get me wrong, I love designing standards and protocols, but this is
pretty niche.

I assume it won't be mandatory so I'm tilting at windmills,  but then if
it's not mandatory, what's the point of putting it in core?  Just create a
jstor style git server where obeying the citation protocol is mandatory.
Of course, enforcing a missing citation is impossible, but it does mean
citations can be generated by parsing imports.  This is how it will evolve
over time, by employing core devs on that server framework.




>> It makes total sense to add citations/references to this list (and those
>> should probably print a reference for Python followed by instructions on
>> how to get references for other packages and how to properly add a
>> reference to your own code).
>>
>>
> If that's possible, that would be great!
>
>
>> I think that an approach similar to help/quit/exit is warranted. The
>> cite()/citation() function need not be *literally* built into the
>> language, it could be an external function written in Python and added
>> to builtins by the site.py module.
>
>
> I was not aware this was a possibility - it does seem like a good option!
>
> If I were you, I'd try organizing a birds-of-a-feather at the next
>> SciPy conference, or start getting in touch with others working on
>> this (duecredit devs, the folks listed on that citationPEP thing,
>> etc.), and go from there. (Feel free to CC me if you do start up some
>> effort like this.)
>
>
> Not all packages are within the numpy/scipy universe - Pandas and Seaborn
> are notable examples.
>
> I bought this thread to the attention of some major scientific package
> maintainers as well as the main citationPEP author. I am not entirely sure
> where this conversations could be moved outside python-ideas given we are
> talking about something universal across packages, but would gladly take
> any suggestions.
>
> There isn't actually any formal method for registering special names
>> like __version__, and they aren't treated specially by the language.
>> They're just variables that happen to have a funny name. You shouldn't
>> start using them willy-nilly, but you don't actually have to ask
>> permission or anything. And it's not very likely that someone else
>> will come along and propose using the name __citation__ for something
>> that *isn't* a citation :-).
>
>
> Thanks for the explanation - Python development and maintenance do seem to
> be a complex process from the outside and this kind of subtleties are not
> always easy to distinguish :).
>
> The way to do this is to first get your solution implemented as a
>> third-party library and adopted by the scientific packages, and then
>> start thinking about whether it would make sense to move the library
>> into the standard library. It's relatively easy to move things into
>> the standard library. The hard part is making sure that you
>> implemented the right thing in the first place, and that's MUCH more
>> likely if you start out as a third-party package.
>
>
> Got it.
>
> I think you misunderstand how these lists work :-). (Which is fine --
>> it's actually pretty opaque and confusing if you don't already know!)
>> Generally, distutils-sig operates totally independently from
>> python-{ideas,dev} -- if you have a packaging proposal, it goes there
>> and not here; if you have a language proposal, it goes here and not
>> there. *If* what you want to do is add some static metadata to python
>> packages through setup.py, then python-ideas is irrelevant and
>> distutils-sig is who you'll have to convince. (But they'll also want
>> to see that your proposal has buy-in from established packages,
>> because they don't understand the intricacies of software citation and
>> will want people they trust to tell them whether the proposal makes
>> sense.)
>>
>
> Got it as well - that does indeed seem a reasonable way of doing things,
> although I believe there have been precedents where GVM implemented a
> feature from scratch after studying existing libraries (I am thinking
> notably about asyncio, which is orders of magnitude more complex and
> involved than anything we are talking here).
>
> And often the custom scripts are a mix of R and Python, and maybe some
>> Fortran, ... Plus, if it works for multiple languages, it means you
>> get to share part of the work with other ecosystems, instead of
>> everyone reinventing the wheel.
>>
>> Also, if you want to go down the dynamic route (which is the only way
>> to get accurate fine-grained citations), then it's just as easy to
>> solve the problem in a language independent way.
>>
>
> In my experience, people tend to go with either one or other or use
> Julia.  I am not very familiar with Fortran ecosystem - as far as I've
> seen, those are extremely efficient libraries that get wrapped and used in
> most modern scientific computing languages, but very rarely directly.
>
> In addition to that, while I see how granular citations could be
> implemented in Python, I have a bit more trouble understanding how calls to
> R, Python, Perl, C, C++ or Fortran from command line scripts can be
> analyzed on the fly to get metadata about citations. I have even more
> trouble imagining how it would be possible to bring developers across all
> the separate language communities to agree on a single standard.
>
> > CLS-JSON represented as a dict to be supplied to the setup file is
>> > definitely one way of doing it. I was, however, thinking more about the
>> > BibTeX format, given that CLS-JSON is more closely affiliated with
>> Mendeley
>>
>> Huh, is it? I only know it from Zotero.
>>
>
> Hm - was not aware Zotero uses it as well - it's definitely a good sign
> and I will have to look into CLS-JSON it more in depth.
>
> Why not scipy.cite() or scipy.citation()?  I don't see any reason for these
>> functions to ship with standard python at all.
>
>
> There are packages that do not depend on scipy and even for those that do
> - most users writing analysis pipelines for scientific packages are unaware
> that they are using scipy/numpy underneath the packages that do what they
> want at the highest level.
>
> I don't think that this is a very useful idea, because most people that
>> I've encountered that don't cite software is people they think that it's
>> not important, not because they don't know what the right citation is.
>> The problem is social and not technological. I don't want to spend time
>> on a technical solution to it.
>>
>
> Thanks for your opinion Gael - as maintainer of scikits-learn you have
> more experience with this issue more than most of us.
>
> In my field (computational biology in molecular biology labs) the
> situation is somewhat different - most of the custom scripts are
> implemented by people who often have learned Python or programming at all
> in the last couple of years. Most of the time they get asked by the
> corresponding author to provide 1-5 citations for their analytical pipeline
> and to describe what they did in the supplementary material and I had
> several junior developers in my labs come forwards to me asking what they
> were supposed to cite and where to find the citations.
>
> We aren't likely to convince everyone to cite code overnight, but making
> citing as easy as possible does seem like a step in the right direction to
> me.
>
> I still think it would be very nice to have an official standard for
>> citation information in Python packages as codified in a PEP. That would
>> reduce ambiguity and make it much easier for tool-writers who want to parse
>> citation information.
>>
>
> That's my opinion as well.
>
> To summarize the conversation until now, it seems that  __citation__ data
> field and a cite() script seem to be the preferred option. If the proposal
> gets traction and is accepted, the citation for Python as well as the
> instructions to get citation for a package can be added as a top-level
> command, similar to credits, copyright or license.
>
> As of now, it seems like the next steps would be to:
>
> - draft a PEP (or complete the existing one) and implement the cite()
> script as well as a show-case package using __citation__
> - talk to major package maintainers to see if they have any objections to
> the method or suggestions with regards to pep/implementation
> - talk to the distutils-sig list to see if we could add the __citation__
> metadata to setup.py
> - submit a proper PEP (Would a pull request to
> https://github.com/python/peps be an acceptable way of doing it?)
>
> Is there something I might be missing so far?
>
> Best,
>
> *Andrei Kucharavy*
>
> Post-Doc @ *Joel S. Bader** Lab*
>
> Johns Hopkins University, Baltimore, USA.
>
>
> On Fri, Jun 29, 2018 at 10:51 AM Nathan Goldbaum <nathan12343 at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jun 28, 2018 at 11:26 PM, Alex Walters <tritium-list at sdamon.com>
>> wrote:
>>
>>> But don't all the users who care about citing modules already use the
>>> scientific python packages, with scipy itself at it's center?  Wouldn't
>>> those engaging in science or in academia be better stewards of this than
>>> systems programmers?  Since you're not asking for anything that can't be
>>> done in a third party module, and there is a third party module that most
>>> of the target audience of this standard would already have, there is zero
>>> reason to take up four names in the python runtime to serve those users.
>>>
>>
>>
>> Not all scientific software in Python depends on scipy or even numpy.
>> However, it does all depend on Python.
>>
>> Although perhaps that argues for a cross-language solution :)
>>
>> I still think it would be very nice to have an official standard for
>> citation information in Python packages as codified in a PEP. That would
>> reduce ambiguity and make it much easier for tool-writers who want to parse
>> citation information.
>>
>> > -----Original Message-----
>>> > From: Adrian Price-Whelan <adrianmpw at gmail.com>
>>> > Sent: Friday, June 29, 2018 12:16 AM
>>> > To: Alex Walters <tritium-list at sdamon.com>
>>> > Cc: Steven D'Aprano <steve at pearwood.info>; python-ideas at python.org
>>> > Subject: Re: [Python-ideas] Add a __cite__ method for scientific
>>> packages
>>> >
>>> > For me, it's about setting a standard that is endorsed by the
>>> > language, and setting expectations for users. There currently is no
>>> > standard, which is why packages use __citation__, __cite__,
>>> > __bibtex__, etc., and as a user I don't immediately know where to look
>>> > for citation information (without going to the source). My feeling is
>>> > that adopting __citation__ or some dunder name could be implemented on
>>> > classes, functions, etc. with less of a chance of naming conflicts,
>>> > but am open to discussion.
>>> >
>>> > I have some notes here about various ideas for more advanced
>>> > functionality that would support automatically keeping track of
>>> > citation information for imported packages, classes, functions:
>>> > https://github.com/adrn/CitationPEP/blob/master/NOTES.md
>>> >
>>> > On Thu, Jun 28, 2018 at 10:57 PM, Alex Walters <
>>> tritium-list at sdamon.com>
>>> > wrote:
>>> > > Why not scipy.cite() or scipy.citation()?  I don't see any reason
>>> for these
>>> > > functions to ship with standard python at all.
>>> > >
>>> > >> -----Original Message-----
>>> > >> From: Python-ideas <python-ideas-bounces+tritium-
>>> > >> list=sdamon.com at python.org> On Behalf Of Steven D'Aprano
>>> > >> Sent: Thursday, June 28, 2018 8:17 PM
>>> > >> To: python-ideas at python.org
>>> > >> Subject: Re: [Python-ideas] Add a __cite__ method for scientific
>>> packages
>>> > >>
>>> > >> On Thu, Jun 28, 2018 at 05:25:00PM -0400, Andrei Kucharavy wrote:
>>> > >>
>>> > >> > As for the list, reserving a __citation__/__cite__ for packages
>>> at the
>>> > > same
>>> > >> > level as __version__ is now reserved and adding a
>>> citation()/cite()
>>> > >> > function to the standard library seemed large enough
>>> modifications to
>>> > >> > warrant searching a buy-in from the maintainers and the community
>>> at
>>> > >> large.
>>> > >>
>>> > >> I think that an approach similar to help/quit/exit is warranted. The
>>> > >> cite()/citation() function need not be *literally* built into the
>>> > >> language, it could be an external function written in Python and
>>> added
>>> > >> to builtins by the site.py module.
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Steve
>>> > >> _______________________________________________
>>> > >> Python-ideas mailing list
>>> > >> Python-ideas at python.org
>>> > >> https://mail.python.org/mailman/listinfo/python-ideas
>>> > >> Code of Conduct: http://python.org/psf/codeofconduct/
>>> > >
>>> > > _______________________________________________
>>> > > Python-ideas mailing list
>>> > > Python-ideas at python.org
>>> > > https://mail.python.org/mailman/listinfo/python-ideas
>>> > > Code of Conduct: http://python.org/psf/codeofconduct/
>>> >
>>> >
>>> >
>>> > --
>>> > Adrian M. Price-Whelan
>>> > Lyman Spitzer, Jr. Postdoctoral Fellow
>>> > Princeton University
>>> > http://adrn.github.io
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180629/823c8097/attachment-0001.html>


More information about the Python-ideas mailing list