Scikit-learn has limited support for information pertaining to each sample (henceforth “sample properties”) to be passed through an estimation pipeline. The user can, for instance, pass fit parameters to all members of a FeatureUnion, or to a specified member of a Pipeline using dunder (__) prefixing:
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.linear_model import LogisticRegression
>>> pipe = Pipeline([('clf', LogisticRegression())])
>>> pipe.fit([[1, 2], [3, 4]], [5, 6],
... clf__sample_weight=[.5, .7])
Several other meta-estimators, such as GridSearchCV, support forwarding these fit parameters to their base estimator when fitting. Yet a number of important use cases are currently not supported.
Features we currently do not support and wish to include:
sample_weight) to a scorer used in cross-validationgroups) to a CV splitter in nested cross validationsample_weight) to some scorers and not others in a multi-metric cross-validation setupA meta-estimator provides along to its children only what they request. A meta-estimator needs to request, on behalf of its children, any metadata that descendant consumers request.
Each object that could receive metadata should have a method called get_metadata_request() which returns a dict that specifies which metadata is consumed by each of its methods (keys of this dictionary are therefore method names, e.g. fit, transform etc.). Estimators supporting weighted fitting may return {} by default, but have a method called request_sample_weight which allows the user to specify the requested sample_weight in each of its methods. make_scorer accepts request_metadata as keyword parameter through which the user can specify what metadata is requested.