[scikit-learn] sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable

Fri May 29 10:02:26 EDT 2020

VotingClassifer also needs names:

ens = VotingClassifier(estimators=[('pipe1', pipe_phy), ('pipe2', pipe_fa)])

Thomas

> On Friday, May 29, 2020 at 2:33 AM, Chamila Wijayarathna <cdwijayarathna at gmail.com (mailto:cdwijayarathna at gmail.com)> wrote:
> Hi all,
>
> I did manage to get the code to run using a workaround, which is bit ugly.
>
> Following is the complete stacktrace of the error I was receiving.
>
> Traceback (most recent call last):
> File "<input>", line 1, in <module>
> File "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
> pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
> File "C:\Program Files\JetBrains\PyCharm 2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
> exec(compile(contents+"\n", file, 'exec'), glob, loc)
> File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 112, in <module>
> ens.fit(x_train,y_train)
> File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 265, in fit
> return super().fit(X, transformed_y, sample_weight)
> File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py", line 65, in fit
> names, clfs = self._validate_estimators()
> File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py", line 228, in _validate_estimators
> self._validate_names(names)
> File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in _validate_names
> invalid_names = [name for name in names if '__' in name]
> File "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 77, in <listcomp>
> invalid_names = [name for name in names if '__' in name]
> TypeError: argument of type 'ColumnTransformer' is not iterable
>
> Following are the inputs in 'names' list at the time of the error.
>
> 1- ColumnTransformer(transformers=[('phy', Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['HR', 'RMSSD', 'SCL'])])
> 2- ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler', StandardScaler())]),['Squality', 'Sneutral', 'Shappy'])])
>
> Seems like that the library is attempting to search for '__' substring of the ColumnTransform object, which it is unable to perform.
>
> Since this name check doesn't have a signiticant effect on my functionality, I commented following snippet at sklearn\utils\metaestimators.py.
>
> invalid_names = [name for name in names if '__' in name]
> if invalid_names:
> raise ValueError('Estimator names must not contain __: got '
> '{0!r}'.format(invalid_names))
>
> Please let me know if there is a better workaround or that their are any issues of commenting out this code.
>
> Thanks
> On Fri, May 29, 2020 at 10:33 AM Chamila Wijayarathna <cdwijayarathna at gmail.com (mailto:cdwijayarathna at gmail.com)> wrote:
> > Hello all,
> >
> > I hope I am writing to the correct mailing list about this issue that I am having. Please apologize me if I am not.
> >
> > I am attempting to use a pipeline to feed an ensemble voting classifier as I want the ensemble learner to use models that train on different feature sets. For this purpose, I followed the tutorial available at [1].
> >
> > Following is the code that I could develop so far.
> >
> > y = df1.index
> > x = preprocessing.scale(df1)
> >
> > phy_features = ['A', 'B', 'C']
> > phy_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
> > phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer, phy_features)])
> >
> > fa_features = ['D', 'E', 'F']
> > fa_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
> > fa_processer = ColumnTransformer(transformers=[('fa', fa_transformer, fa_features)])
> >
> >
> > pipe_phy = Pipeline(steps=[('preprocessor', phy_processer ),('classifier', SVM)])
> > pipe_fa = Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])
> >
> > ens = VotingClassifier(estimators=[pipe_phy, pipe_fa])
> >
> > cv = KFold(n_splits=10, random_state=None, shuffle=True)
> > for train_index, test_index in cv.split(x):
> > x_train, x_test = x[train_index], x[test_index]
> > y_train, y_test = y[train_index], y[test_index]
> > ens.fit(x_train,y_train)
> > print(ens.score(x_test, y_test))
> >
> > However, when running the code, I am getting an error saying TypeError: argument of type 'ColumnTransformer' is not iterable, at the line ens.fit(x_train,y_train).
> >
> > What is the reason for this and how can I fix it?
> >
> > Thank you,
> > Chamila
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
> --
> Chamila Dilshan Wijayarathna,
> PhD Research Student
> The University of New South Wales (UNSW Canberra)
> Australian Centre for Cyber Security
> Australian Defence Force Academy
> PO Box 7916, Canberra BA ACT 2610
> Australia
> Mobile:(+61)416895795
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200529/e4874689/attachment-0001.html>