<div dir="ltr"><div>SMOTENC will internally one hot encode the features, generate new features, and finally decode.</div><div>So you need to do something like:</div><div><br></div><br><pre id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4204" style="color:rgb(0,0,0)">from imblearn.pipeline import make_pipeline, Pipeline</pre><div><pre id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4204" style="color:rgb(0,0,0)">num_indices1 = <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4205" style="color:rgb(0,0,128)">list</span>(X.iloc[:,np.r_[<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4206" style="color:rgb(0,0,255)">0</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4207" style="color:rgb(0,0,255)">94</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4208" style="color:rgb(0,0,255)">95</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4209" style="color:rgb(0,0,255)">97</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4210" style="color:rgb(0,0,255)">100</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4211" style="color:rgb(0,0,255)">123</span>]].columns.values)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4212" clear="none">cat_indices1 = <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4213" style="color:rgb(0,0,128)">list</span>(X.iloc[:,np.r_[<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4214" style="color:rgb(0,0,255)">94</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4215" style="color:rgb(0,0,255)">96</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4216" style="color:rgb(0,0,255)">98</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4217" style="color:rgb(0,0,255)">99</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4218" style="color:rgb(0,0,255)">123</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4219" style="color:rgb(0,0,255)">160</span>]].columns.values)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4220" clear="none"><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4221" style="color:rgb(0,0,128)">print</span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4222" style="color:rgb(0,0,128)">len</span>(num_indices1))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4223" clear="none"><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4224" style="color:rgb(0,0,128)">print</span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4225" style="color:rgb(0,0,128)">len</span>(cat_indices1))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4226" clear="none"><br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4227" clear="none">pipeline=Pipeline(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4228" style="color:rgb(102,0,153)">steps</span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4229" clear="none"> <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4230" style="color:rgb(128,128,128);font-style:italic"># Categorical features<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4231" clear="none"></span><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4232" style="color:rgb(128,128,128);font-style:italic"> </span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4233" style="color:rgb(0,128,128);font-weight:bold">'feature_processing'</span>, FeatureUnion(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4234" style="color:rgb(102,0,153)">transformer_list </span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4235" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4236" style="color:rgb(0,128,128);font-weight:bold">'categorical'</span>, MultiColumn(cat_indices1)),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4237" clear="none"><br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4238" clear="none"> <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4239" style="color:rgb(128,128,128);font-style:italic">#numeric<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4240" clear="none"></span><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4241" style="color:rgb(128,128,128);font-style:italic"> </span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4242" style="color:rgb(0,128,128);font-weight:bold">'numeric'</span>, Pipeline(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4243" style="color:rgb(102,0,153)">steps </span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4244" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4245" style="color:rgb(0,128,128);font-weight:bold">'select'</span>, MultiColumn(num_indices1)),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4246" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4247" style="color:rgb(0,128,128);font-weight:bold">'scale'</span>, StandardScaler())<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4248" clear="none"> ]))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4249" clear="none"> ])),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4250" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4251" style="color:rgb(0,128,128);font-weight:bold">'clf'</span>, rg)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4252" clear="none"> ]<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4253" clear="none">)<br></pre><pre id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4204" style="color:rgb(0,0,0)">pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)<br></pre> </div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 20 Jan 2019 at 18:05, S Hamidizade <<a href="mailto:hamidizade.s@gmail.com">hamidizade.s@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Dear Scikit-learners</div><div>Hi.</div><div dir="ltr"><br></div><div dir="ltr">I would greatly appreciate if you could let me know how to use SMOTENC. I wrote: <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4255"></span></div><div dir="ltr"><pre id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4204" style="color:rgb(0,0,0)">num_indices1 = <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4205" style="color:rgb(0,0,128)">list</span>(X.iloc[:,np.r_[<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4206" style="color:rgb(0,0,255)">0</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4207" style="color:rgb(0,0,255)">94</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4208" style="color:rgb(0,0,255)">95</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4209" style="color:rgb(0,0,255)">97</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4210" style="color:rgb(0,0,255)">100</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4211" style="color:rgb(0,0,255)">123</span>]].columns.values)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4212" clear="none">cat_indices1 = <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4213" style="color:rgb(0,0,128)">list</span>(X.iloc[:,np.r_[<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4214" style="color:rgb(0,0,255)">94</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4215" style="color:rgb(0,0,255)">96</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4216" style="color:rgb(0,0,255)">98</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4217" style="color:rgb(0,0,255)">99</span>,<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4218" style="color:rgb(0,0,255)">123</span>:<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4219" style="color:rgb(0,0,255)">160</span>]].columns.values)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4220" clear="none"><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4221" style="color:rgb(0,0,128)">print</span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4222" style="color:rgb(0,0,128)">len</span>(num_indices1))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4223" clear="none"><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4224" style="color:rgb(0,0,128)">print</span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4225" style="color:rgb(0,0,128)">len</span>(cat_indices1))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4226" clear="none"><br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4227" clear="none">pipeline=Pipeline(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4228" style="color:rgb(102,0,153)">steps</span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4229" clear="none"> <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4230" style="color:rgb(128,128,128);font-style:italic"># Categorical features<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4231" clear="none"></span><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4232" style="color:rgb(128,128,128);font-style:italic"> </span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4233" style="color:rgb(0,128,128);font-weight:bold">'feature_processing'</span>, FeatureUnion(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4234" style="color:rgb(102,0,153)">transformer_list </span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4235" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4236" style="color:rgb(0,128,128);font-weight:bold">'categorical'</span>, MultiColumn(cat_indices1)),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4237" clear="none"><br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4238" clear="none"> <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4239" style="color:rgb(128,128,128);font-style:italic">#numeric<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4240" clear="none"></span><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4241" style="color:rgb(128,128,128);font-style:italic"> </span>(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4242" style="color:rgb(0,128,128);font-weight:bold">'numeric'</span>, Pipeline(<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4243" style="color:rgb(102,0,153)">steps </span>= [<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4244" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4245" style="color:rgb(0,128,128);font-weight:bold">'select'</span>, MultiColumn(num_indices1)),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4246" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4247" style="color:rgb(0,128,128);font-weight:bold">'scale'</span>, StandardScaler())<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4248" clear="none"> ]))<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4249" clear="none"> ])),<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4250" clear="none"> (<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4251" style="color:rgb(0,128,128);font-weight:bold">'clf'</span>, rg)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4252" clear="none"> ]<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4253" clear="none">)<br id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4254" clear="none"></pre></div><div class="gmail-m_-2727352775481686478gmail-yiv9624972395qtdSeparateBR" id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4083" dir="ltr">Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using <span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4364" style="background-color:rgb(228,228,255)">get_dummies.</span></div><div id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4593" dir="ltr"> Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "<span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4482">categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline? </span></div><div id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4604" dir="ltr"><br clear="none"></div><div id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4605" dir="ltr"><span id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4628">Thanks in advance.</span></div><div id="gmail-m_-2727352775481686478gmail-yiv9624972395yui_3_16_0_ym19_1_1547750570226_4606" dir="ltr">Best regards,<br></div></div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Guillaume Lemaitre<br>INRIA Saclay - Parietal team<br>Center for Data Science Paris-Saclay<br><a href="https://glemaitre.github.io/" target="_blank">https://glemaitre.github.io/</a></div></div></div></div></div></div></div>