[scikit-learn] sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable

Sat May 30 01:20:31 EDT 2020

Thank you both for your inputs.

On Fri, May 29, 2020 at 9:57 PM Nicolas Hug <niourf at gmail.com> wrote:

> Also, you should not scale your input before computing cross-validation
> scores. By doing that you are biasing your results because each test set
> knows something about the rest of the data (even if it's not target data)
>
> The scaling should be applied independently on each (train / test) pair.
>
> This can be done through a pipeline:
> https://scikit-learn.org/stable/modules/compose.html
>
>
> On 5/29/20 11:52 AM, Thomas J Fan wrote:
>
> Once
>
> *x = preprocessing.scale(df1)*
>
> is called, the input to your estimator is no longer a dataframe, so the
> column transformer can not use strings to select columns.
>
> Thomas
>
> On Friday, May 29, 2020 at 11:46 AM, Chamila Wijayarathna <
> cdwijayarathna at gmail.com> wrote:
> Hi,
>
> Thanks, this solution fixed the issue. However, it introduces a new error,
> which was not there before.
>
> Traceback (most recent call last):
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
> line 425, in _get_column_indices
>     all_columns = X.columns
> AttributeError: 'numpy.ndarray' object has no attribute 'columns'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "<input>", line 1, in <module>
>   File "C:\Program Files\JetBrains\PyCharm
> 2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line
> 197, in runfile
>     pydev_imports.execfile(filename, global_vars, local_vars)  # execute
> the script
>   File "C:\Program Files\JetBrains\PyCharm
> 2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line
> 18, in execfile
>     exec(compile(contents+"\n", file, 'exec'), glob, loc)
>   File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 127, in
> <module>
>     ens.fit(x_train,y_train)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
> line 265, in fit
>     return super().fit(X, transformed_y, sample_weight)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
> line 81, in fit
>     for idx, clf in enumerate(clfs) if clf not in (None, 'drop')
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
> line 1029, in __call__
>     if self.dispatch_one_batch(iterator):
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
> line 847, in dispatch_one_batch
>     self._dispatch(tasks)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
> line 765, in _dispatch
>     job = self._backend.apply_async(batch, callback=cb)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
> line 206, in apply_async
>     result = ImmediateResult(func)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
> line 570, in __init__
>     self.results = batch()
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
> line 253, in __call__
>     for func, args, kwargs in self.items]
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
> line 253, in <listcomp>
>     for func, args, kwargs in self.items]
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
> line 40, in _fit_single_estimator
>     estimator.fit(X, y)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
> line 330, in fit
>     Xt = self._fit(X, y, **fit_params_steps)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
> line 296, in _fit
>     **fit_params_steps[name])
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\memory.py",
> line 352, in __call__
>     return self.func(*args, **kwargs)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
> line 740, in _fit_transform_one
>     res = transformer.fit_transform(X, y, **fit_params)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
> line 529, in fit_transform
>     self._validate_remainder(X)
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
> line 327, in _validate_remainder
>     cols.extend(_get_column_indices(X, columns))
>   File
> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
> line 427, in _get_column_indices
>     raise ValueError("Specifying the columns using strings is only "
> ValueError: Specifying the columns using strings is only supported for
> pandas DataFrames
>
> Thanks
>
> On Fri, May 29, 2020 at 7:33 PM Thomas J Fan <thomasjpfan at gmail.com>
> wrote:
>
>> VotingClassifer also needs names:
>>
>> ens = VotingClassifier(estimators=[('pipe1', pipe_phy), ('pipe2',
>> pipe_fa)])
>>
>> Thomas
>>
>> On Friday, May 29, 2020 at 2:33 AM, Chamila Wijayarathna <
>> cdwijayarathna at gmail.com> wrote:
>> Hi all,
>>
>> I did manage to get the code to run using a workaround, which is bit ugly.
>>
>> Following is the complete stacktrace of the error I was receiving.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Traceback (most recent call last):   File "<input>", line 1, in <module>
>>   File "C:\Program Files\JetBrains\PyCharm
>> 2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line
>> 197, in runfile     pydev_imports.execfile(filename, global_vars,
>> local_vars)  # execute the script   File "C:\Program
>> Files\JetBrains\PyCharm
>> 2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line
>> 18, in execfile     exec(compile(contents+"\n", file, 'exec'), glob, loc)
>> File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 112, in
>> <module>     ens.fit(x_train,y_train)   File
>> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
>> line 265, in fit     return super().fit(X, transformed_y, sample_weight)
>> File
>> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
>> line 65, in fit     names, clfs = self._validate_estimators()   File
>> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
>> line 228, in _validate_estimators     self._validate_names(names)   File
>> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
>> line 77, in _validate_names     invalid_names = [name for name in names if
>> '__' in name]   File
>> "C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
>> line 77, in <listcomp>     invalid_names = [name for name in names if '__'
>> in name] TypeError: argument of type 'ColumnTransformer' is not iterable*
>>
>> Following are the inputs in 'names' list at the time of the error.
>>
>> 1-
>> *ColumnTransformer(transformers=[('phy', Pipeline(steps=[('imputer',
>> SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['HR',
>> 'RMSSD', 'SCL'])]) 2-
>> ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',
>> StandardScaler())]),['Squality', 'Sneutral', 'Shappy'])])*
>>
>> Seems like that the library is attempting to search for '__' substring of
>> the ColumnTransform object, which it is unable to perform.
>>
>> Since this name check doesn't have a signiticant effect on my
>> functionality, I commented following snippet at
>> *sklearn\utils\metaestimators.py.*
>>
>>
>>
>>
>> *invalid_names = [name for name in names if '__' in name] if
>> invalid_names:     raise ValueError('Estimator names must not contain __:
>> got '                     '{0!r}'.format(invalid_names))*
>>
>> Please let me know if there is a better workaround or that their are any
>> issues of commenting out this code.
>>
>> Thanks
>>
>> On Fri, May 29, 2020 at 10:33 AM Chamila Wijayarathna <
>> cdwijayarathna at gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I hope I am writing to the correct mailing list about this issue that I
>>> am having. Please apologize me if I am not.
>>>
>>> I am attempting to use a pipeline to feed an ensemble voting classifier
>>> as I want the ensemble learner to use models that train on different
>>> feature sets. For this purpose, I followed the tutorial available at [1].
>>>
>>> Following is the code that I could develop so far.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *y = df1.index x = preprocessing.scale(df1) phy_features = ['A', 'B',
>>> 'C'] phy_transformer = Pipeline(steps=[('imputer',
>>> SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
>>> phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer,
>>> phy_features)]) fa_features = ['D', 'E', 'F'] fa_transformer =
>>> Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler',
>>> StandardScaler())]) fa_processer = ColumnTransformer(transformers=[('fa',
>>> fa_transformer, fa_features)]) pipe_phy = Pipeline(steps=[('preprocessor',
>>> phy_processer ),('classifier', SVM)]) pipe_fa =
>>> Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)]) ens =
>>> VotingClassifier(estimators=[pipe_phy, pipe_fa]) cv = KFold(n_splits=10,
>>> random_state=None, shuffle=True) for train_index, test_index in
>>> cv.split(x):     x_train, x_test = x[train_index], x[test_index]
>>> y_train, y_test = y[train_index], y[test_index]
>>> ens.fit(x_train,y_train)     print(ens.score(x_test, y_test))*
>>>
>>> However, when running the code, I am getting an error saying *TypeError:
>>> argument of type 'ColumnTransformer' is not iterable*, at the line
>>> *ens.fit(x_train,y_train).*
>>>
>>> What is the reason for this and how can I fix it?
>>>
>>> Thank you,
>>> Chamila
>>>
>>
>>
>> --
>> Chamila Dilshan Wijayarathna,
>> PhD Research Student
>> The University of New South Wales (UNSW Canberra)
>> Australian Centre for Cyber Security
>> Australian Defence Force Academy
>> PO Box 7916, Canberra BA ACT 2610
>> Australia
>> Mobile:(+61)416895795
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Chamila Dilshan Wijayarathna,
> PhD Research Student
> The University of New South Wales (UNSW Canberra)
> Australian Centre for Cyber Security
> Australian Defence Force Academy
> PO Box 7916, Canberra BA ACT 2610
> Australia
> Mobile:(+61)416895795
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>

-- 
Chamila Dilshan Wijayarathna,
PhD Research Student
The University of New South Wales (UNSW Canberra)
Australian Centre for Cyber Security
Australian Defence Force Academy
PO Box 7916, Canberra BA ACT 2610
Australia
Mobile:(+61)416895795
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200530/b33c1385/attachment-0001.html>