Imblearn: SMOTENC
Dear Scikit-learners Hi. I would greatly appreciate if you could let me know how to use SMOTENC. I wrote: num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1)) pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)), #numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] ) Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline? Thanks in advance. Best regards,
SMOTENC will internally one hot encode the features, generate new features, and finally decode. So you need to do something like: from imblearn.pipeline import make_pipeline, Pipeline num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1)) pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)), #numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] ) pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline) On Sun, 20 Jan 2019 at 18:05, S Hamidizade <hamidizade.s@gmail.com> wrote:
Dear Scikit-learners Hi.
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance. Best regards, _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
Dear Mr. Lemaitre Thanks a lot for sharing your time and knowledge. Unfortunately, it throws the following error: Traceback (most recent call last): 119 File "D:/mifs-master_2/MU/learning-from-imbalanced-classes-master/learning-from-imbalanced-classes-master/continuous/Final Logit/SMOTENC/logit-final - Copy.py", line 419, in <module> 41 pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices1), pipeline) File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 594, in make_pipeline return Pipeline(_name_estimators(steps), memory=memory) File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 119, in __init__ self._validate_steps() File "C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 167, in _validate_steps " '%s' (type %s) doesn't" % (t, type(t))) TypeError: All intermediate steps should be transformers and implement fit and transform. 'SMOTENC(categorical_features=['x95', 'x97', 'x99', 'x100', 'x121_1', 'x121_2', 'x121_3', 'x121_4', 'x121_5', 'x121_6', 'x121_7', 'x121_8', 'x121_9', 'x121_10', 'x121_11', 'x121_12', 'x121_13', 'x121_14', 'x121_15', 'x121_16', 'x121_17', 'x121_18', 'x121_19', 'x121_20', 'x121_21', 'x121_22', 'x121_23', 'x121_24', 'x121_25', 'x121_26', 'x121_27', 'x121_28', 'x121_29', 'x121_30', 'x121_31', 'x121_32', 'x121_33', 'x121_34', 'x121_35', 'x121_36', 'x121_37'], k_neighbors=5, n_jobs=1, random_state=None, sampling_strategy='auto')' (type <class 'imblearn.over_sampling._smote.SMOTENC'>) doesn't Thanks in advance. Best regards, On Mon, Jan 21, 2019 at 2:26 PM Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
SMOTENC will internally one hot encode the features, generate new features, and finally decode. So you need to do something like:
from imblearn.pipeline import make_pipeline, Pipeline
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)
On Sun, 20 Jan 2019 at 18:05, S Hamidizade <hamidizade.s@gmail.com> wrote:
Dear Scikit-learners Hi.
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance. Best regards, _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
As stated in the doc, categorical_features are the indices of the categorical column and not the name of the columns. This is similar to the one hot encoder API. Sent from my phone - sorry to be brief and potential misspell.
Thanks. Unfortunately, now the error is: ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 160. Best regards, On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade <hamidizade.s@gmail.com> wrote:
Dear Scikit-learners Hi.
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance. Best regards,
You should open a ticket on imbalanced-learn GitHub issue. This is easier to post a reproducible example and for us to test it.
From the error message, I can understand that you have 161 features and require a feature above the index 160.
On Thu, 24 Jan 2019 at 16:19, S Hamidizade <hamidizade.s@gmail.com> wrote:
Thanks. Unfortunately, now the error is: ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 160. Best regards,
On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade <hamidizade.s@gmail.com> wrote:
Dear Scikit-learners Hi.
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance. Best regards,
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/
Thanks. The code is provided here: https://github.com/scikit-learn-contrib/imbalanced-learn/issues/537 Best regards, On Thu, Jan 24, 2019 at 7:15 PM Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
You should open a ticket on imbalanced-learn GitHub issue. This is easier to post a reproducible example and for us to test it. From the error message, I can understand that you have 161 features and require a feature above the index 160.
On Thu, 24 Jan 2019 at 16:19, S Hamidizade <hamidizade.s@gmail.com> wrote:
Thanks. Unfortunately, now the error is: ValueError: Some of the categorical indices are out of range. Indices should be between 0 and 160. Best regards,
On Sun, Jan 20, 2019 at 8:31 PM S Hamidizade <hamidizade.s@gmail.com> wrote:
Dear Scikit-learners Hi.
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values) cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values) print(len(num_indices1)) print(len(cat_indices1))
pipeline=Pipeline(steps= [ # Categorical features ('feature_processing', FeatureUnion(transformer_list = [ ('categorical', MultiColumn(cat_indices1)),
#numeric ('numeric', Pipeline(steps = [ ('select', MultiColumn(num_indices1)), ('scale', StandardScaler()) ])) ])), ('clf', rg) ] )
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies. Sorry, I think SMOTENC should be inserted before the classifier ('clf', reg) but I don't know how to define "categorical_features" in SMOTENC. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance. Best regards,
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Guillaume Lemaître -
S Hamidizade