For historical reasons we've built up an EA namespace without much internal logic in terms of what is public/private. While this isn't _that_ big of a deal, it'd be nice to make this more coherent. I see two useful options: 1) Use the traditional "an underscore means this should only be called from within self". Very few methods on the base class satisfy that characteristic, including the constructor _from_sequence. One benefit of moving to this is it would make "official" that we shouldn't be using _values_for_foo from outside EA methods. 2) Use underscores to signal to 3rd party authors whether or not there exists a working (not necessarily performant) implementation on the base class. In this scenario authors would _have_ to implement private methods, while implementing public methods would be optional. Thoughts?
On Thu, 26 Jan 2023 at 23:30, Brock Mendel <jbrockmendel@gmail.com> wrote:
For historical reasons we've built up an EA namespace without much internal logic in terms of what is public/private. While this isn't _that_ big of a deal, it'd be nice to make this more coherent. I see two useful options:
In my opinion (and recollection), at the start when ExtensionArrays were introduced, the rule was quite clear: *everything* on the base class is considered as public for developers (EA implementors can (or need to) override those), and then whether the actual name is public vs private (i.e. leading underscore or not) depends on whether it should be public for end users (not implementors). And we use documentation / comments to indicate to developers (EA implementors) which parts are required to implement or are optional to implement.
1) Use the traditional "an underscore means this should only be called from within self". Very few methods on the base class satisfy that characteristic, including the constructor _from_sequence. One benefit of moving to this is it would make "official" that we shouldn't be using _values_for_foo from outside EA methods.
2) Use underscores to signal to 3rd party authors whether or not there exists a working (not necessarily performant) implementation on the base class. In this scenario authors would _have_ to implement private methods, while implementing public methods would be optional.
That would make some of the currently private (and not useful for end-users) methods public, and some public methods private (if we do that for existing methods, and not as a rule for new methods). But what is the
We don't want to make all those "private" functions for EAs to implement public to end-users, so I don't think this is an option. Also, there *are* some valid cases to call the _values_for_.. methods outside of other EA methods, so this is not a general rule. main goal you want to achieve here? That it is clearer for EA implementors what they need to implement? (currently we use AbstractMethodError for that which seems already clear to me, and we have base tests that you can inherit that should cover those basic things you need to implement) Joris
Thoughts?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (2)
-
Brock Mendel -
Joris Van den Bossche