<div dir="ltr">Thank you both for your inputs.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 29, 2020 at 9:57 PM Nicolas Hug <<a href="mailto:niourf@gmail.com">niourf@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p>Also, you should not scale your input before computing
      cross-validation scores. By doing that you are biasing your
      results because each test set knows something about the rest of
      the data (even if it's not target data)<br>
    </p>
    <p>The scaling should be applied independently on each (train /
      test) pair.</p>
    <p>This can be done through a pipeline:
      <a href="https://scikit-learn.org/stable/modules/compose.html" target="_blank">https://scikit-learn.org/stable/modules/compose.html</a></p>
    <p><br>
    </p>
    <div>On 5/29/20 11:52 AM, Thomas J Fan
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      
      
      <img id="gmail-m_-8566659625984776669903546EA52763171D1C68836225C6C48" src="https://read-receipts.canarymail.io:8100/track/7ECE6E6ED73491F27AD53D550FD0B8DC_903546EA52763171D1C68836225C6C48.png" width="0px" height="0px">
      <div id="gmail-m_-8566659625984776669CanaryBody">
        <div> Once </div>
        <div><br>
        </div>
        <div><i>x = preprocessing.scale(df1)</i><br>
        </div>
        <div><i><br>
          </i></div>
        <div>is called, the input to your estimator is no longer a
          dataframe, so the column transformer can not use strings to
          select columns.</div>
        <div><br>
        </div>
      </div>
      <div id="gmail-m_-8566659625984776669CanarySig">
        <div>
          <div style="font-family:Helvetica">Thomas</div>
          <div><br>
          </div>
        </div>
      </div>
      <div id="gmail-m_-8566659625984776669CanaryDropbox"> </div>
      <blockquote id="gmail-m_-8566659625984776669CanaryBlockquote">
        <div>
          <div>On Friday, May 29, 2020 at 11:46 AM, Chamila Wijayarathna
            <<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
            wrote:<br>
          </div>
          <div>
            <div dir="ltr">Hi,
              <div><br>
              </div>
              <div>Thanks, this solution fixed the issue. However, it
                introduces a new error, which was not there before.</div>
              <div><br>
              </div>
              <div>Traceback (most recent call last):<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
                line 425, in _get_column_indices<br>
                    all_columns = X.columns<br>
                AttributeError: 'numpy.ndarray' object has no attribute
                'columns'<br>
                During handling of the above exception, another
                exception occurred:<br>
                Traceback (most recent call last):<br>
                  File "<input>", line 1, in <module><br>
                  File "C:\Program Files\JetBrains\PyCharm
                2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py",
                line 197, in runfile<br>
                    pydev_imports.execfile(filename, global_vars,
                local_vars)  # execute the script<br>
                  File "C:\Program Files\JetBrains\PyCharm
                2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py",
                line 18, in execfile<br>
                    exec(compile(contents+"\n", file, 'exec'), glob,
                loc)<br>
                  File
                "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py",
                line 127, in <module><br>
                    ens.fit(x_train,y_train)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
                line 265, in fit<br>
                    return super().fit(X, transformed_y, sample_weight)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
                line 81, in fit<br>
                    for idx, clf in enumerate(clfs) if clf not in (None,
                'drop')<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
                line 1029, in __call__<br>
                    if self.dispatch_one_batch(iterator):<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
                line 847, in dispatch_one_batch<br>
                    self._dispatch(tasks)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
                line 765, in _dispatch<br>
                    job = self._backend.apply_async(batch, callback=cb)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
                line 206, in apply_async<br>
                    result = ImmediateResult(func)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\_parallel_backends.py",
                line 570, in __init__<br>
                    self.results = batch()<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
                line 253, in __call__<br>
                    for func, args, kwargs in self.items]<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\parallel.py",
                line 253, in <listcomp><br>
                    for func, args, kwargs in self.items]<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
                line 40, in _fit_single_estimator<br>
                    estimator.fit(X, y)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
                line 330, in fit<br>
                    Xt = self._fit(X, y, **fit_params_steps)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
                line 296, in _fit<br>
                    **fit_params_steps[name])<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\joblib\memory.py",
                line 352, in __call__<br>
                    return self.func(*args, **kwargs)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\pipeline.py",
                line 740, in _fit_transform_one<br>
                    res = transformer.fit_transform(X, y, **fit_params)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
                line 529, in fit_transform<br>
                    self._validate_remainder(X)<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\compose\_column_transformer.py",
                line 327, in _validate_remainder<br>
                    cols.extend(_get_column_indices(X, columns))<br>
                  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\__init__.py",
                line 427, in _get_column_indices<br>
                    raise ValueError("Specifying the columns using
                strings is only "<br>
                ValueError: Specifying the columns using strings is only
                supported for pandas DataFrames<br>
              </div>
              <div><br>
              </div>
              <div>Thanks</div>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Fri, May 29, 2020 at
                7:33 PM Thomas J Fan <<a href="mailto:thomasjpfan@gmail.com" target="_blank">thomasjpfan@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                <div style="font-family:Helvetica;color:rgb(0,0,0);font-size:13px"><img id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334A3E6E897AF51BF28A83A866537F6A06F" width="0px" height="0px">
                  <div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
                    <div>VotingClassifer also needs names:</div>
                    <div><br>
                    </div>
                    <div>ens = VotingClassifier(estimators=[('pipe1',
                      pipe_phy), ('pipe2', pipe_fa)])</div>
                    <div><br>
                    </div>
                  </div>
                  <div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
                    <div>
                      <div style="font-family:Helvetica">Thomas</div>
                      <div><br>
                      </div>
                    </div>
                  </div>
                  <div id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334CanaryDropbox"> </div>
                  <blockquote id="gmail-m_-8566659625984776669gmail-m_-5715342925610960334">
                    <div>
                      <div>On Friday, May 29, 2020 at 2:33 AM, Chamila
                        Wijayarathna <<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
                        wrote:<br>
                      </div>
                      <div>
                        <div dir="ltr">Hi all, 
                          <div><br>
                          </div>
                          <div>I did manage to get the code to run using
                            a workaround, which is bit ugly.</div>
                          <div><br>
                          </div>
                          <div>Following is the complete stacktrace of
                            the error I was receiving. </div>
                          <div><br>
                          </div>
                          <div><i>Traceback (most recent call last):<br>
                                File "<input>", line 1, in
                              <module><br>
                                File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line
                              197, in runfile<br>
                                  pydev_imports.execfile(filename,
                              global_vars, local_vars)  # execute the
                              script<br>
                                File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py",
                              line 18, in execfile<br>
                                  exec(compile(contents+"\n", file,
                              'exec'), glob, loc)<br>
                                File
                              "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py",
                              line 112, in <module><br>
                                  ens.fit(x_train,y_train)<br>
                                File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
                              line 265, in fit<br>
                                  return super().fit(X, transformed_y,
                              sample_weight)<br>
                                File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
                              line 65, in fit<br>
                                  names, clfs =
                              self._validate_estimators()<br>
                                File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
                              line 228, in _validate_estimators<br>
                                  self._validate_names(names)<br>
                                File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
                              line 77, in _validate_names<br>
                                  invalid_names = [name for name in
                              names if '__' in name]<br>
                                File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
                              line 77, in <listcomp><br>
                                  invalid_names = [name for name in
                              names if '__' in name]<br>
                              TypeError: argument of type
                              'ColumnTransformer' is not iterable</i><br>
                          </div>
                          <div><i><br>
                            </i></div>
                          <div>Following are the inputs in 'names' list
                            at the time of the error.</div>
                          <div><br>
                          </div>
                          <div>1- <i>ColumnTransformer(transformers=[('phy',
                              Pipeline(steps=[('imputer',
                              SimpleImputer(strategy='median')),
                              ('scaler', StandardScaler())]), ['HR',
                              'RMSSD', 'SCL'])])<br>
                              2-
ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',
                              StandardScaler())]),['Squality',
                              'Sneutral', 'Shappy'])])</i><br>
                          </div>
                          <div><br>
                          </div>
                          <div>Seems like that the library is attempting
                            to search for '__' substring of the
                            ColumnTransform object, which it is unable
                            to perform.</div>
                          <div><br>
                          </div>
                          <div>Since this name check doesn't have a
                            signiticant effect on my functionality, I
                            commented following snippet at  <i>sklearn\utils\metaestimators.py.</i></div>
                          <div><br>
                          </div>
                          <div><i>invalid_names = [name for name in
                              names if '__' in name]<br>
                              if invalid_names:<br>
                                  raise ValueError('Estimator names must
                              not contain __: got '<br>
                                                 
                              '{0!r}'.format(invalid_names))</i><br>
                          </div>
                          <div><br>
                          </div>
                          <div>Please let me know if there is a better
                            workaround or that their are any issues of
                            commenting out this code.</div>
                          <div><br>
                          </div>
                          <div>Thanks</div>
                        </div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Fri, May
                            29, 2020 at 10:33 AM Chamila Wijayarathna
                            <<a href="mailto:cdwijayarathna@gmail.com" target="_blank">cdwijayarathna@gmail.com</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div dir="ltr">Hello all,
                              <div><br>
                              </div>
                              <div>I hope I am writing to the correct
                                mailing list about this issue that I am
                                having. Please apologize me if I am not.</div>
                              <div><br>
                              </div>
                              <div>I am attempting to use a pipeline to
                                feed an ensemble voting classifier as I
                                want the ensemble learner to use models
                                that train on different feature sets.
                                For this purpose, I followed the
                                tutorial available at [1].<br>
                                <br>
                                Following is the code that I could
                                develop so far.<br>
                                <br>
                                <i>y = df1.index<br>
                                  x = preprocessing.scale(df1)<br>
                                  <br>
                                  phy_features = ['A', 'B', 'C']<br>
                                  phy_transformer =
                                  Pipeline(steps=[('imputer',
                                  SimpleImputer(strategy='median')),
                                  ('scaler', StandardScaler())])<br>
                                  phy_processer =
                                  ColumnTransformer(transformers=[('phy',
                                  phy_transformer, phy_features)])<br>
                                  <br>
                                  fa_features = ['D', 'E', 'F']<br>
                                  fa_transformer =
                                  Pipeline(steps=[('imputer',
                                  SimpleImputer(strategy='median')),
                                  ('scaler', StandardScaler())])<br>
                                  fa_processer =
                                  ColumnTransformer(transformers=[('fa',
                                  fa_transformer, fa_features)])<br>
                                  <br>
                                  <br>
                                  pipe_phy =
                                  Pipeline(steps=[('preprocessor',
                                  phy_processer ),('classifier', SVM)])<br>
                                  pipe_fa =
                                  Pipeline(steps=[('preprocessor',
                                  fa_processer ),('classifier', SVM)])<br>
                                  <br>
                                  ens =
                                  VotingClassifier(estimators=[pipe_phy,
                                  pipe_fa])<br>
                                  <br>
                                  cv = KFold(n_splits=10,
                                  random_state=None, shuffle=True)<br>
                                  for train_index, test_index in
                                  cv.split(x):<br>
                                      x_train, x_test = x[train_index],
                                  x[test_index]<br>
                                      y_train, y_test = y[train_index],
                                  y[test_index]<br>
                                      ens.fit(x_train,y_train)<br>
                                      print(ens.score(x_test, y_test))</i></div>
                              <div><i><br>
                                </i>However, when running the code, I am
                                getting an error saying <i>TypeError:
                                  argument of type 'ColumnTransformer'
                                  is not iterable</i>, at the line <i>ens.fit(x_train,y_train).</i><br>
                                <br>
                                What is the reason for this and how can
                                I fix it?<br>
                              </div>
                              <div>
                                <div dir="ltr">
                                  <div dir="ltr">
                                    <div>
                                      <div dir="ltr">
                                        <div>
                                          <div dir="ltr">
                                            <div>
                                              <div dir="ltr">
                                                <div>
                                                  <div dir="ltr">
                                                    <div dir="ltr">
                                                      <div dir="ltr">
                                                        <div><br>
                                                        </div>
                                                        <div>Thank you,</div>
                                                        <div>Chamila</div>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br clear="all">
                        <div><br>
                        </div>
                        -- <br>
                        <div dir="ltr">
                          <div dir="ltr">
                            <div>
                              <div dir="ltr">
                                <div>
                                  <div dir="ltr">
                                    <div>
                                      <div dir="ltr">
                                        <div>
                                          <div dir="ltr">
                                            <div dir="ltr">
                                              <div dir="ltr">
                                                <div>
                                                  <div>Chamila Dilshan
                                                    Wijayarathna,</div>
                                                  <div>PhD Research
                                                    Student</div>
                                                  <div>The University of
                                                    New South Wales
                                                    (UNSW Canberra)</div>
                                                  <div>Australian Centre
                                                    for Cyber Security</div>
                                                  <div>Australian
                                                    Defence Force
                                                    Academy</div>
                                                  <div>PO Box 7916,
                                                    Canberra BA ACT 2610</div>
                                                  <div>Australia</div>
                                                  <div>Mobile:(+61)416895795</div>
                                                </div>
                                                <div><br>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                        _______________________________________________
                        <br>
                        scikit-learn mailing list <br>
                        <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
                        <br>
                        <a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a>
                        <br>
                      </div>
                    </div>
                  </blockquote>
                </div>
                _______________________________________________<br>
                scikit-learn mailing list<br>
                <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
                <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
              </blockquote>
            </div>
            <br clear="all">
            <div><br>
            </div>
            -- <br>
            <div dir="ltr">
              <div dir="ltr">
                <div>
                  <div dir="ltr">
                    <div>
                      <div dir="ltr">
                        <div>
                          <div dir="ltr">
                            <div>
                              <div dir="ltr">
                                <div dir="ltr">
                                  <div dir="ltr">
                                    <div>
                                      <div>Chamila Dilshan Wijayarathna,</div>
                                      <div>PhD Research Student</div>
                                      <div>The University of New South
                                        Wales (UNSW Canberra)</div>
                                      <div>Australian Centre for Cyber
                                        Security</div>
                                      <div>Australian Defence Force
                                        Academy</div>
                                      <div>PO Box 7916, Canberra BA ACT
                                        2610</div>
                                      <div>Australia</div>
                                      <div>Mobile:(+61)416895795</div>
                                    </div>
                                    <div><br>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
            _______________________________________________ <br>
            scikit-learn mailing list <br>
            <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a> <br>
            <a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a> <br>
          </div>
        </div>
      </blockquote>
      <br>
      <fieldset></fieldset>
      <pre>_______________________________________________
scikit-learn mailing list
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a>
</pre>
    </blockquote>
  </div>

_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><div>Chamila Dilshan Wijayarathna,</div><div>PhD Research Student</div><div>The University of New South Wales (UNSW Canberra)</div><div>Australian Centre for Cyber Security</div><div>Australian Defence Force Academy</div><div>PO Box 7916, Canberra BA ACT 2610</div><div>Australia</div><div>Mobile:(+61)416895795</div></div><div><br></div></div></div></div></div></div></div></div></div></div></div></div></div>