<div dir="ltr">Thank you both for your inputs.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 29, 2020 at 9:57 PM Nicolas Hug <<a href="mailto:niourf@gmail.com">niourf@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Also, you should not scale your input before computing
cross-validation scores. By doing that you are biasing your
results because each test set knows something about the rest of
the data (even if it's not target data)<br>
</p>
<p>The scaling should be applied independently on each (train /
test) pair.</p>
<p>This can be done through a pipeline:
<a href="https://scikit-learn.org/stable/modules/compose.html" target="_blank">https://scikit-learn.org/stable/modules/compose.html</a></p>
<p><br>
</p>
<div>On 5/29/20 11:52 AM, Thomas J Fan
wrote:<br>
</div>
<blockquote type="cite">
<img id="gmail-m_-8566659625984776669903546EA52763171D1C68836225C6C48" src="https://read-receipts.canarymail.io:8100/track/7ECE6E6ED73491F27AD53D550FD0B8DC_903546EA52763171D1C68836225C6C48.png" width="0px" height="0px">
<div id="gmail-m_-8566659625984776669CanaryBody">
<div> Once </div>
<div><br>
</div>
<div><i>x = preprocessing.scale(df1)</i><br>
</div>
<div><i><br>
</i></div>
<div>is called, the input to your estimator is no longer a
dataframe, so the column transformer can not use strings to
select columns.</div>
<div><br>
</div>
</div>
<div id="gmail-m_-8566659625984776669CanarySig">
<div>
<div style="font-family:Helvetica">Thomas</div>
<div><br>
</div>
</div>
</div>
<div id="gmail-m_-8566659625984776669CanaryDropbox"> </div>
<blockquote id="gmail-m_-8566659625984776669CanaryBlockquote">
<div>
<div>On Friday, May 29, 2020 at 11:46 AM, Chamila Wijayarathna
<<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
wrote:<br>
</div>
<div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>Thanks, this solution fixed the issue. However, it
introduces a new error, which was not there before.</div>
<div><br>
</div>
<div>Traceback (most recent call last):<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
line 425, in _get_column_indices<br>
all_columns = X.columns<br>
AttributeError: 'numpy.ndarray' object has no attribute
'columns'<br>
During handling of the above exception, another
exception occurred:<br>
Traceback (most recent call last):<br>
File "<input>", line 1, in <module><br>
File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py",
line 197, in runfile<br>
pydev_imports.execfile(filename, global_vars,
local_vars) # execute the script<br>
File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py",
line 18, in execfile<br>
exec(compile(contents+"\n", file, 'exec'), glob,
loc)<br>
File
"C:/Users/ASUS/PycharmProjects/swelltest/enemble.py",
line 127, in <module><br>
ens.fit(x_train,y_train)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 265, in fit<br>
return super().fit(X, transformed_y, sample_weight)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 81, in fit<br>
for idx, clf in enumerate(clfs) if clf not in (None,
'drop')<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
line 1029, in __call__<br>
if self.dispatch_one_batch(iterator):<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
line 847, in dispatch_one_batch<br>
self._dispatch(tasks)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
line 765, in _dispatch<br>
job = self._backend.apply_async(batch, callback=cb)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
line 206, in apply_async<br>
result = ImmediateResult(func)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
line 570, in __init__<br>
self.results = batch()<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
line 253, in __call__<br>
for func, args, kwargs in self.items]<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
line 253, in <listcomp><br>
for func, args, kwargs in self.items]<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
line 40, in _fit_single_estimator<br>
estimator.fit(X, y)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
line 330, in fit<br>
Xt = self._fit(X, y, **fit_params_steps)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
line 296, in _fit<br>
**fit_params_steps[name])<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\memory.py",
line 352, in __call__<br>
return self.func(*args, **kwargs)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
line 740, in _fit_transform_one<br>
res = transformer.fit_transform(X, y, **fit_params)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
line 529, in fit_transform<br>
self._validate_remainder(X)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
line 327, in _validate_remainder<br>
cols.extend(_get_column_indices(X, columns))<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
line 427, in _get_column_indices<br>
raise ValueError("Specifying the columns using
strings is only "<br>
ValueError: Specifying the columns using strings is only
supported for pandas DataFrames<br>
</div>
<div><br>
</div>
<div>Thanks</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, May 29, 2020 at
7:33 PM Thomas J Fan <<a href="mailto:thomasjpfan@gmail.com" target="_blank">thomasjpfan@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="font-family:Helvetica;color:rgb(0,0,0);font-size:13px"><img id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334A3E6E897AF51BF28A83A866537F6A06F" width="0px" height="0px">
<div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
<div>VotingClassifer also needs names:</div>
<div><br>
</div>
<div>ens = VotingClassifier(estimators=[('pipe1',
pipe_phy), ('pipe2', pipe_fa)])</div>
<div><br>
</div>
</div>
<div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
<div>
<div style="font-family:Helvetica">Thomas</div>
<div><br>
</div>
</div>
</div>
<div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334CanaryDropbox"> </div>
<blockquote id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
<div>
<div>On Friday, May 29, 2020 at 2:33 AM, Chamila
Wijayarathna <<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
wrote:<br>
</div>
<div>
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I did manage to get the code to run using
a workaround, which is bit ugly.</div>
<div><br>
</div>
<div>Following is the complete stacktrace of
the error I was receiving. </div>
<div><br>
</div>
<div><i>Traceback (most recent call last):<br>
File "<input>", line 1, in
<module><br>
File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line
197, in runfile<br>
pydev_imports.execfile(filename,
global_vars, local_vars) # execute the
script<br>
File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py",
line 18, in execfile<br>
exec(compile(contents+"\n", file,
'exec'), glob, loc)<br>
File
"C:/Users/ASUS/PycharmProjects/swelltest/enemble.py",
line 112, in <module><br>
ens.fit(x_train,y_train)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 265, in fit<br>
return super().fit(X, transformed_y,
sample_weight)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 65, in fit<br>
names, clfs =
self._validate_estimators()<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
line 228, in _validate_estimators<br>
self._validate_names(names)<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
line 77, in _validate_names<br>
invalid_names = [name for name in
names if '__' in name]<br>
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
line 77, in <listcomp><br>
invalid_names = [name for name in
names if '__' in name]<br>
TypeError: argument of type
'ColumnTransformer' is not iterable</i><br>
</div>
<div><i><br>
</i></div>
<div>Following are the inputs in 'names' list
at the time of the error.</div>
<div><br>
</div>
<div>1- <i>ColumnTransformer(transformers=[('phy',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler())]), ['HR',
'RMSSD', 'SCL'])])<br>
2-
ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',
StandardScaler())]),['Squality',
'Sneutral', 'Shappy'])])</i><br>
</div>
<div><br>
</div>
<div>Seems like that the library is attempting
to search for '__' substring of the
ColumnTransform object, which it is unable
to perform.</div>
<div><br>
</div>
<div>Since this name check doesn't have a
signiticant effect on my functionality, I
commented following snippet at <i>sklearn\utils\metaestimators.py.</i></div>
<div><br>
</div>
<div><i>invalid_names = [name for name in
names if '__' in name]<br>
if invalid_names:<br>
raise ValueError('Estimator names must
not contain __: got '<br>
'{0!r}'.format(invalid_names))</i><br>
</div>
<div><br>
</div>
<div>Please let me know if there is a better
workaround or that their are any issues of
commenting out this code.</div>
<div><br>
</div>
<div>Thanks</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, May
29, 2020 at 10:33 AM Chamila Wijayarathna
<<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hello all,
<div><br>
</div>
<div>I hope I am writing to the correct
mailing list about this issue that I am
having. Please apologize me if I am not.</div>
<div><br>
</div>
<div>I am attempting to use a pipeline to
feed an ensemble voting classifier as I
want the ensemble learner to use models
that train on different feature sets.
For this purpose, I followed the
tutorial available at [1].<br>
<br>
Following is the code that I could
develop so far.<br>
<br>
<i>y = df1.index<br>
x = preprocessing.scale(df1)<br>
<br>
phy_features = ['A', 'B', 'C']<br>
phy_transformer =
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler())])<br>
phy_processer =
ColumnTransformer(transformers=[('phy',
phy_transformer, phy_features)])<br>
<br>
fa_features = ['D', 'E', 'F']<br>
fa_transformer =
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler', StandardScaler())])<br>
fa_processer =
ColumnTransformer(transformers=[('fa',
fa_transformer, fa_features)])<br>
<br>
<br>
pipe_phy =
Pipeline(steps=[('preprocessor',
phy_processer ),('classifier', SVM)])<br>
pipe_fa =
Pipeline(steps=[('preprocessor',
fa_processer ),('classifier', SVM)])<br>
<br>
ens =
VotingClassifier(estimators=[pipe_phy,
pipe_fa])<br>
<br>
cv = KFold(n_splits=10,
random_state=None, shuffle=True)<br>
for train_index, test_index in
cv.split(x):<br>
x_train, x_test = x[train_index],
x[test_index]<br>
y_train, y_test = y[train_index],
y[test_index]<br>
ens.fit(x_train,y_train)<br>
print(ens.score(x_test, y_test))</i></div>
<div><i><br>
</i>However, when running the code, I am
getting an error saying <i>TypeError:
argument of type 'ColumnTransformer'
is not iterable</i>, at the line <i>ens.fit(x_train,y_train).</i><br>
<br>
What is the reason for this and how can
I fix it?<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>Thank you,</div>
<div>Chamila</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>
<div>Chamila Dilshan
Wijayarathna,</div>
<div>PhD Research
Student</div>
<div>The University of
New South Wales
(UNSW Canberra)</div>
<div>Australian Centre
for Cyber Security</div>
<div>Australian
Defence Force
Academy</div>
<div>PO Box 7916,
Canberra BA ACT 2610</div>
<div>Australia</div>
<div>Mobile:(+61)416895795</div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________
<br>
scikit-learn mailing list <br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a>
<br>
</div>
</div>
</blockquote>
</div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>
<div>Chamila Dilshan Wijayarathna,</div>
<div>PhD Research Student</div>
<div>The University of New South
Wales (UNSW Canberra)</div>
<div>Australian Centre for Cyber
Security</div>
<div>Australian Defence Force
Academy</div>
<div>PO Box 7916, Canberra BA ACT
2610</div>
<div>Australia</div>
<div>Mobile:(+61)416895795</div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________ <br>
scikit-learn mailing list <br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a> <br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a> <br>
</div>
</div>
</blockquote>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
scikit-learn mailing list
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div>Chamila Dilshan Wijayarathna,</div><div>PhD Research Student</div><div>The University of New South Wales (UNSW Canberra)</div><div>Australian Centre for Cyber Security</div><div>Australian Defence Force Academy</div><div>PO Box 7916, Canberra BA ACT 2610</div><div>Australia</div><div>Mobile:(+61)416895795</div></div><div><br></div></div></div></div></div></div></div></div></div></div></div></div></div>