<div dir="ltr">With thanks to Alex, Adrin and Christian, we have a proposal to implement what we used to call "sample props" that should be expressive enough for us to resolve tens of issues and PRs, but will be largely unobtrusive for most current users.<div><br></div><div>Core developers, please cast your vote in <a href="https://github.com/scikit-learn/enhancement_proposals/pull/52">this PR</a> after considering the proposal <a href="https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep006/proposal.html">here</a>, which has a partial implementation in <a href="https://github.com/scikit-learn/scikit-learn/pull/16079">#16079</a>.</div><div><br></div><div><br></div><div>In brief, the problem we are trying to solve:</div><div><br></div><div><p style="box-sizing:border-box;line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;background-color:rgb(252,252,252)">Scikit-learn has limited support for information pertaining to each sample (henceforth “sample properties”) to be passed through an estimation pipeline. The user can, for instance, pass fit parameters to all members of a FeatureUnion, or to a specified member of a Pipeline using dunder (<code class="gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(231,76,60);overflow-x:auto"><span class="gmail-pre" style="box-sizing:border-box">__</span></code>) prefixing:</p><div class="gmail-highlight-default gmail-notranslate" style="box-sizing:border-box;border:1px solid rgb(225,228,229);overflow-x:auto;margin:1px 0px 24px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;font-size:16px;background-color:rgb(252,252,252)"><div class="gmail-highlight" style="box-sizing:border-box;background:rgb(238,255,204);border:none;overflow-x:auto;margin:0px;padding:0px"><pre style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;line-height:1.4;margin-top:0px;margin-bottom:0px;padding:12px;overflow:auto"><span style="box-sizing:border-box"></span><span class="gmail-gp" style="box-sizing:border-box;color:rgb(198,93,9);font-weight:bold">>>> </span><span class="gmail-kn" style="box-sizing:border-box;color:rgb(0,112,32);font-weight:bold">from</span> <span class="gmail-nn" style="box-sizing:border-box;color:rgb(14,132,181);font-weight:bold">sklearn.pipeline</span> <span class="gmail-kn" style="box-sizing:border-box;color:rgb(0,112,32);font-weight:bold">import</span> <span class="gmail-n" style="box-sizing:border-box">Pipeline</span>
<span class="gmail-gp" style="box-sizing:border-box;color:rgb(198,93,9);font-weight:bold">>>> </span><span class="gmail-kn" style="box-sizing:border-box;color:rgb(0,112,32);font-weight:bold">from</span> <span class="gmail-nn" style="box-sizing:border-box;color:rgb(14,132,181);font-weight:bold">sklearn.linear_model</span> <span class="gmail-kn" style="box-sizing:border-box;color:rgb(0,112,32);font-weight:bold">import</span> <span class="gmail-n" style="box-sizing:border-box">LogisticRegression</span>
<span class="gmail-gp" style="box-sizing:border-box;color:rgb(198,93,9);font-weight:bold">>>> </span><span class="gmail-n" style="box-sizing:border-box">pipe</span> <span class="gmail-o" style="box-sizing:border-box;color:rgb(102,102,102)">=</span> <span class="gmail-n" style="box-sizing:border-box">Pipeline</span><span class="gmail-p" style="box-sizing:border-box">([(</span><span class="gmail-s1" style="box-sizing:border-box;color:rgb(64,112,160)">'clf'</span><span class="gmail-p" style="box-sizing:border-box">,</span> <span class="gmail-n" style="box-sizing:border-box">LogisticRegression</span><span class="gmail-p" style="box-sizing:border-box">())])</span>
<span class="gmail-gp" style="box-sizing:border-box;color:rgb(198,93,9);font-weight:bold">>>> </span><span class="gmail-n" style="box-sizing:border-box">pipe</span><span class="gmail-o" style="box-sizing:border-box;color:rgb(102,102,102)">.</span><span class="gmail-n" style="box-sizing:border-box">fit</span><span class="gmail-p" style="box-sizing:border-box">([[</span><span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">1</span><span class="gmail-p" style="box-sizing:border-box">,</span> <span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">2</span><span class="gmail-p" style="box-sizing:border-box">],</span> <span class="gmail-p" style="box-sizing:border-box">[</span><span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">3</span><span class="gmail-p" style="box-sizing:border-box">,</span> <span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">4</span><span class="gmail-p" style="box-sizing:border-box">]],</span> <span class="gmail-p" style="box-sizing:border-box">[</span><span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">5</span><span class="gmail-p" style="box-sizing:border-box">,</span> <span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">6</span><span class="gmail-p" style="box-sizing:border-box">],</span>
<span class="gmail-gp" style="box-sizing:border-box;color:rgb(198,93,9);font-weight:bold">... </span>         <span class="gmail-n" style="box-sizing:border-box">clf__sample_weight</span><span class="gmail-o" style="box-sizing:border-box;color:rgb(102,102,102)">=</span><span class="gmail-p" style="box-sizing:border-box">[</span><span class="gmail-o" style="box-sizing:border-box;color:rgb(102,102,102)">.</span><span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">5</span><span class="gmail-p" style="box-sizing:border-box">,</span> <span class="gmail-o" style="box-sizing:border-box;color:rgb(102,102,102)">.</span><span class="gmail-mi" style="box-sizing:border-box;color:rgb(32,128,80)">7</span><span class="gmail-p" style="box-sizing:border-box">])</span>  
</pre></div></div><p style="box-sizing:border-box;line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;background-color:rgb(252,252,252)">Several other meta-estimators, such as GridSearchCV, support forwarding these fit parameters to their base estimator when fitting. Yet a number of important use cases are currently not supported.</p><p style="box-sizing:border-box;line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;background-color:rgb(252,252,252)">Features we currently do not support and wish to include:</p><ul class="gmail-simple" style="box-sizing:border-box;margin:0px 0px 24px;padding:0px;list-style-position:initial;line-height:24px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;font-size:16px;background-color:rgb(252,252,252)"><li style="box-sizing:border-box;list-style:disc;margin-left:24px">passing sample properties (e.g. <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-sample_weight" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">sample_weight</span></code></a>) to a scorer used in cross-validation</li><li style="box-sizing:border-box;list-style:disc;margin-left:24px">passing sample properties (e.g. <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-groups" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">groups</span></code></a>) to a CV splitter in nested cross validation</li><li style="box-sizing:border-box;list-style:disc;margin-left:24px">passing sample properties (e.g. <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-sample_weight" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">sample_weight</span></code></a>) to some scorers and not others in a multi-metric cross-validation setup</li></ul></div><div><h2 style="box-sizing:border-box;margin-top:0px;font-family:"Roboto Slab",ff-tisa-web-pro,Georgia,Arial,sans-serif;font-size:24px;color:rgb(64,64,64);background-color:rgb(252,252,252)">Solution: Each consumer requests</h2><h2 style="box-sizing:border-box;margin-top:0px;font-family:"Roboto Slab",ff-tisa-web-pro,Georgia,Arial,sans-serif;font-size:24px;color:rgb(64,64,64);background-color:rgb(252,252,252)"></h2><p style="box-sizing:border-box;line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;background-color:rgb(252,252,252)">A meta-estimator provides along to its children only what they request. A meta-estimator needs to request, on behalf of its children, any metadata that descendant consumers request.</p><p style="box-sizing:border-box;line-height:24px;margin:0px 0px 24px;font-size:16px;color:rgb(64,64,64);font-family:Lato,proxima-nova,"Helvetica Neue",Arial,sans-serif;background-color:rgb(252,252,252)">Each object that could receive metadata should have a method called <code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">get_metadata_request()</span></code> which returns a dict that specifies which metadata is consumed by each of its methods (keys of this dictionary are therefore method names, e.g. <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-fit" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">fit</span></code></a>, <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-transform" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">transform</span></code></a> etc.). Estimators supporting weighted fitting may return <code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">{}</span></code> by default, but have a method called <code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">request_sample_weight</span></code> which allows the user to specify the requested <a class="gmail-reference external" href="https://scikit-learn.org/stable/glossary.html#term-sample_weight" title="(in scikit-learn v0.24)" style="box-sizing:border-box;color:rgb(41,128,185);text-decoration-line:none"><code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;background:rgb(255,255,255);border:1px solid rgb(225,228,229);padding:2px 5px;color:rgb(64,64,64);overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">sample_weight</span></code></a> in each of its methods. <code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="background:rgb(255,255,255);box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;border:1px solid rgb(225,228,229);padding:2px 5px;overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">make_scorer</span></code> accepts <code class="gmail-xref gmail-any gmail-docutils gmail-literal gmail-notranslate" style="background:rgb(255,255,255);box-sizing:border-box;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",Courier,monospace;font-size:12px;white-space:nowrap;max-width:100%;border:1px solid rgb(225,228,229);padding:2px 5px;overflow-x:auto;font-weight:bold"><span class="gmail-pre" style="box-sizing:border-box">request_metadata</span></code> as keyword parameter through which the user can specify what metadata is requested.</p></div><div><br></div><div>Regards,</div><div><br></div><div>Joel</div></div>