Open questions regarding docstrings
Hi there, There are few things regarding the docstrings, that are still open to discussion, in many cases because the numpy convention (or numpy doc examples) is different for the unwritten convention used in most pandas docstrings. Probably you've seen the discussion in GitHub, but I list them here, with the proposed decision (mainly keep the pandas way). If anyone disagrees in any point, please let us know, so we'll change the documentation for the sprint, and do it in the desired way. 1) Starting the docstring just after the opening triple quotes, or in the next line. In pandas it's more common to do it in the next line, so we'll keep it this way. 2) For parameters, showing the default value after the type, or after the description. Numpy does not find it necessary to specify them, and it specified the recommended place is after the description. The proposal (mainly by Joris) is to always have them and after the type, as it's easier to see it. 3) For parameters expecting a string, in the numpy convention examples `str` is used, the proposal is to use `string` instead. 4) For complex types like dicts, I think there is some consensus that is easier to understand the types if using brackets (e.g. "dict of {str: int}" over "dict of str: int"). And same for tuples (e.g. "tuple of (int, str, int)" over "tuple of int, str, int"). For list and sets, the type is simpler (e.g. "list of int" or "set of str"). I propose to use the brackets for list and tuple, and not for list and set, and use `str` over `string` if part of a complex type. 5) For cases where a parameter is optional, so, have a None value by default, meaning the value is not required (as I understand if it was the case of `fillna(value=None)` value wouldn't be optional, as it means is the value used to replace `NaN`). In this case, the proposal is to use as the type, something like "int or float, optional" over "int, float or None (default None)". 6) When the parameter expects something in the form of a Python list, a numpy array, a pandas Series... document it as "array-like" over other options list "iterable" or "numpy.array, Series or list". Thanks!
Hi Marc, Thanks for pulling out this list. The only one of these that seems potentially objectionable to me is #3 - it does seem like we're pretty inconsistent on this currently, but in my opinion it'd be better to side with `str` - matching the actual python type, numpy, mpyp annotations, etc? On Mon, Mar 5, 2018 at 2:55 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi there,
There are few things regarding the docstrings, that are still open to discussion, in many cases because the numpy convention (or numpy doc examples) is different for the unwritten convention used in most pandas docstrings.
Probably you've seen the discussion in GitHub, but I list them here, with the proposed decision (mainly keep the pandas way). If anyone disagrees in any point, please let us know, so we'll change the documentation for the sprint, and do it in the desired way.
1) Starting the docstring just after the opening triple quotes, or in the next line. In pandas it's more common to do it in the next line, so we'll keep it this way.
2) For parameters, showing the default value after the type, or after the description. Numpy does not find it necessary to specify them, and it specified the recommended place is after the description. The proposal (mainly by Joris) is to always have them and after the type, as it's easier to see it.
3) For parameters expecting a string, in the numpy convention examples `str` is used, the proposal is to use `string` instead.
4) For complex types like dicts, I think there is some consensus that is easier to understand the types if using brackets (e.g. "dict of {str: int}" over "dict of str: int"). And same for tuples (e.g. "tuple of (int, str, int)" over "tuple of int, str, int"). For list and sets, the type is simpler (e.g. "list of int" or "set of str"). I propose to use the brackets for list and tuple, and not for list and set, and use `str` over `string` if part of a complex type.
5) For cases where a parameter is optional, so, have a None value by default, meaning the value is not required (as I understand if it was the case of `fillna(value=None)` value wouldn't be optional, as it means is the value used to replace `NaN`). In this case, the proposal is to use as the type, something like "int or float, optional" over "int, float or None (default None)".
6) When the parameter expects something in the form of a Python list, a numpy array, a pandas Series... document it as "array-like" over other options list "iterable" or "numpy.array, Series or list".
Thanks!
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Agreed with Chris about 3. In the same vein, about 4 and 6, I'd could see more precision in the docstrings as an aid to adopting function annotations and mypy in the future. Is List[int] too ugly / unusual for readers? Case in point, one of your examples from 6, a Python list, isn't array like (in the sense that is_array_like(List) is False). Documenting exactly what we mean by array-like is probably not something we're ready for, but I'd like to hear what others thing about adopting mypy's spelling of types where it's not too burdensome. Tom On Mon, Mar 5, 2018 at 2:07 PM, Chris Bartak <cbartak@gmail.com> wrote:
Hi Marc,
Thanks for pulling out this list. The only one of these that seems potentially objectionable to me is #3 - it does seem like we're pretty inconsistent on this currently, but in my opinion it'd be better to side with `str` - matching the actual python type, numpy, mpyp annotations, etc?
On Mon, Mar 5, 2018 at 2:55 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi there,
There are few things regarding the docstrings, that are still open to discussion, in many cases because the numpy convention (or numpy doc examples) is different for the unwritten convention used in most pandas docstrings.
Probably you've seen the discussion in GitHub, but I list them here, with the proposed decision (mainly keep the pandas way). If anyone disagrees in any point, please let us know, so we'll change the documentation for the sprint, and do it in the desired way.
1) Starting the docstring just after the opening triple quotes, or in the next line. In pandas it's more common to do it in the next line, so we'll keep it this way.
2) For parameters, showing the default value after the type, or after the description. Numpy does not find it necessary to specify them, and it specified the recommended place is after the description. The proposal (mainly by Joris) is to always have them and after the type, as it's easier to see it.
3) For parameters expecting a string, in the numpy convention examples `str` is used, the proposal is to use `string` instead.
4) For complex types like dicts, I think there is some consensus that is easier to understand the types if using brackets (e.g. "dict of {str: int}" over "dict of str: int"). And same for tuples (e.g. "tuple of (int, str, int)" over "tuple of int, str, int"). For list and sets, the type is simpler (e.g. "list of int" or "set of str"). I propose to use the brackets for list and tuple, and not for list and set, and use `str` over `string` if part of a complex type.
5) For cases where a parameter is optional, so, have a None value by default, meaning the value is not required (as I understand if it was the case of `fillna(value=None)` value wouldn't be optional, as it means is the value used to replace `NaN`). In this case, the proposal is to use as the type, something like "int or float, optional" over "int, float or None (default None)".
6) When the parameter expects something in the form of a Python list, a numpy array, a pandas Series... document it as "array-like" over other options list "iterable" or "numpy.array, Series or list".
Thanks!
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Yes, thanks for the overview. Regarding the type descriptions, as a reference, an overview of all currently used type descriptions can be seen here: https://github.com/pandas-dev/pandas/pull/19704#issuecomment-369405611
From that you can see that many things are now rather inconsistent .. (str vs string, optional vs default None, ... in most cases rather equally used). So we should make choices! :)
For str vs string: I *think* "string" can be more readable and understandable for newcomers (not sure how well known the str type is for this user group). But of course, if taking "string" rather than "str", we should maybe also look at "int" vs "integer", "bool" vs "boolean", etc. I can live with either decision. Joris 2018-03-05 23:07 GMT+01:00 Chris Bartak <cbartak@gmail.com>:
Hi Marc,
Thanks for pulling out this list. The only one of these that seems potentially objectionable to me is #3 - it does seem like we're pretty inconsistent on this currently, but in my opinion it'd be better to side with `str` - matching the actual python type, numpy, mpyp annotations, etc?
On Mon, Mar 5, 2018 at 2:55 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi there,
There are few things regarding the docstrings, that are still open to discussion, in many cases because the numpy convention (or numpy doc examples) is different for the unwritten convention used in most pandas docstrings.
Probably you've seen the discussion in GitHub, but I list them here, with the proposed decision (mainly keep the pandas way). If anyone disagrees in any point, please let us know, so we'll change the documentation for the sprint, and do it in the desired way.
1) Starting the docstring just after the opening triple quotes, or in the next line. In pandas it's more common to do it in the next line, so we'll keep it this way.
2) For parameters, showing the default value after the type, or after the description. Numpy does not find it necessary to specify them, and it specified the recommended place is after the description. The proposal (mainly by Joris) is to always have them and after the type, as it's easier to see it.
3) For parameters expecting a string, in the numpy convention examples `str` is used, the proposal is to use `string` instead.
4) For complex types like dicts, I think there is some consensus that is easier to understand the types if using brackets (e.g. "dict of {str: int}" over "dict of str: int"). And same for tuples (e.g. "tuple of (int, str, int)" over "tuple of int, str, int"). For list and sets, the type is simpler (e.g. "list of int" or "set of str"). I propose to use the brackets for list and tuple, and not for list and set, and use `str` over `string` if part of a complex type.
5) For cases where a parameter is optional, so, have a None value by default, meaning the value is not required (as I understand if it was the case of `fillna(value=None)` value wouldn't be optional, as it means is the value used to replace `NaN`). In this case, the proposal is to use as the type, something like "int or float, optional" over "int, float or None (default None)".
6) When the parameter expects something in the form of a Python list, a numpy array, a pandas Series... document it as "array-like" over other options list "iterable" or "numpy.array, Series or list".
Thanks!
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (4)
-
Chris Bartak -
Joris Van den Bossche -
Marc Garcia -
Tom Augspurger