[scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer

Andreas Mueller t3kcit at gmail.com
Fri Aug 4 10:50:40 EDT 2017


Yes, that's totally fine. The error is unrelated and just means you need 
to call ``check_is_fitted`` in your predict method
to give a nicer error message.


On 08/04/2017 06:29 AM, Sam Barnett wrote:
> Hi Andy,
> I have since been able to resolve the pickling issue, though I am now 
> getting an error message saying that an error message does not include 
> the expected string 'fit'. In general, I am trying to use the fit() 
> method of my classifier to instantiate a separate SVC() classifier 
> with a custom kernel, fit THAT to the data, then return this instance 
> as the fitted version of the new classifier. Is this possible in 
> theory? If so, what is the best way to implement it?
>
> As before, the requisite code and a .ipynb file is attached.
>
> Best,
> Sam
>
> On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller <t3kcit at gmail.com 
> <mailto:t3kcit at gmail.com>> wrote:
>
>     Hi Sam.
>     You need to put these into a reachable namespace (possibly as
>     private functions) so that they can be pickled.
>     Please stay on the sklearn mailing list, I might not have time to
>     reply.
>
>     Andy
>
>
>     On 08/03/2017 01:24 PM, Sam Barnett wrote:
>>     Hi Andy,
>>
>>     I've since tried a different solution: instead of a pipeline,
>>     I've simply created a classifier that is for the most part like
>>     svm.SVC, though it takes a few extra inputs for the
>>     sequentialisation step. I've used a Python function that can
>>     compute the Gram matrix between two datasets of any shape to pass
>>     into SVC(), though I'm now having trouble with pickling on the
>>     check_estimator test. It appears that SeqSVC.fit() doesn't like
>>     to have methods defined within it. Can you see how to pass this
>>     test? (the .ipynb file shows the error).
>>
>>     Best,
>>     Sam
>>
>>     On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett
>>     <sambarnett95 at gmail.com <mailto:sambarnett95 at gmail.com>> wrote:
>>
>>         You're right: it does fail without GridSearchCV when I change
>>         the size of seq_test. I will look at the transform tomorrow
>>         to see if I can work this out. Thank you for your help so far!
>>
>>         On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller
>>         <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>
>>             Change the size of seq_test in your notebook and you'll
>>             see the failure without GridSearchCV.
>>             I haven't looked at your code in detail, but transform is
>>             supposed to work on arbitrary new data with the same
>>             number of features.
>>             Your code requires the test data to have the same shape
>>             as the training data.
>>             Cross-validation will lead to training data and test data
>>             having different sizes. But I feel like something is
>>             already wrong if your
>>             test data size depends on your training data size.
>>
>>
>>
>>             On 08/02/2017 03:08 PM, Sam Barnett wrote:
>>>             Hi Andy,
>>>
>>>             The purpose of the transformer is to take an ordinary
>>>             kernel (in this case I have taken 'rbf' as a default)
>>>             and return a 'sequentialised' kernel using a few extra
>>>             parameters. Hence, the transformer takes an ordinary
>>>             data-target pair X, y as its input, and the
>>>             fit_transform(X, y) method will output the Gram matrix
>>>             for X that is associated with this sequentialised
>>>             kernel. In the pipeline, this Gram matrix is passed into
>>>             an SVC classifier with the kernel parameter set to
>>>             'precomputed'.
>>>
>>>             Therefore, I do not think your hacky solution would be
>>>             possible. However, I am still unsure how to implement
>>>             your first solution: won't the Gram matrix from the
>>>             transformer contain all the necessary kernel values?
>>>             Could you elaborate further?
>>>
>>>
>>>             Best,
>>>             Sam
>>>
>>>             On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller
>>>             <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>>
>>>                 Hi Sam.
>>>                 GridSearchCV will do cross-validation, which
>>>                 requires to "transform" the test data.
>>>                 The shape of the test-data will be different from
>>>                 the shape of the training data.
>>>                 You need to have the ability to compute the kernel
>>>                 between the training data and new test data.
>>>
>>>                 A more hacky solution would be to compute the full
>>>                 kernel matrix in advance and pass that to GridSearchCV.
>>>
>>>                 You probably don't need it here, but you should also
>>>                 checkout what the _pairwise attribute does in
>>>                 cross-validation,
>>>                 because that it likely to come up when playing with
>>>                 kernels.
>>>
>>>                 Hth,
>>>                 Andy
>>>
>>>
>>>                 On 08/02/2017 08:38 AM, Sam Barnett wrote:
>>>>                 Dear all,
>>>>
>>>>                 I have created a 2-step pipeline with a custom
>>>>                 transformer followed by a simple SVC classifier,
>>>>                 and I wish to run a grid-search over it. I am able
>>>>                 to successfully create the transformer and the
>>>>                 pipeline, and each of these elements work fine.
>>>>                 However, when I try to use the fit() method on my
>>>>                 GridSearchCV object, I get the following error:
>>>>
>>>>                 57 # during fit.
>>>>                      58 if X.shape != self.input_shape_:
>>>>                 ---> 59             raise ValueError('Shape of
>>>>                 input is different from what was seen '
>>>>                      60                              'in `fit`')
>>>>                      61
>>>>
>>>>                 ValueError: Shape of input is different from what
>>>>                 was seen in `fit`
>>>>
>>>>                 For a full breakdown of the problem, I have written
>>>>                 a Jupyter notebook showing exactly how the error
>>>>                 occurs (this also contains all .py files necessary
>>>>                 to run the notebook). Can anybody see how to work
>>>>                 through this?
>>>>
>>>>                 Many thanks,
>>>>                 Sam Barnett
>>>>
>>>>
>>>>
>>>>                 _______________________________________________
>>>>                 scikit-learn mailing list
>>>>                 scikit-learn at python.org
>>>>                 <mailto:scikit-learn at python.org>
>>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>>                 <https://mail.python.org/mailman/listinfo/scikit-learn>
>>>
>>>
>>>                 _______________________________________________
>>>                 scikit-learn mailing list
>>>                 scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>                 https://mail.python.org/mailman/listinfo/scikit-learn
>>>                 <https://mail.python.org/mailman/listinfo/scikit-learn>
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             scikit-learn mailing list
>>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>>             <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170804/c5ffbc05/attachment-0001.html>


More information about the scikit-learn mailing list