[scikit-learn] Problems with running GridSearchCV on a pipeline with a custom transformer
Andreas Mueller
t3kcit at gmail.com
Fri Aug 4 10:50:40 EDT 2017
Yes, that's totally fine. The error is unrelated and just means you need
to call ``check_is_fitted`` in your predict method
to give a nicer error message.
On 08/04/2017 06:29 AM, Sam Barnett wrote:
> Hi Andy,
> I have since been able to resolve the pickling issue, though I am now
> getting an error message saying that an error message does not include
> the expected string 'fit'. In general, I am trying to use the fit()
> method of my classifier to instantiate a separate SVC() classifier
> with a custom kernel, fit THAT to the data, then return this instance
> as the fitted version of the new classifier. Is this possible in
> theory? If so, what is the best way to implement it?
>
> As before, the requisite code and a .ipynb file is attached.
>
> Best,
> Sam
>
> On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller <t3kcit at gmail.com
> <mailto:t3kcit at gmail.com>> wrote:
>
> Hi Sam.
> You need to put these into a reachable namespace (possibly as
> private functions) so that they can be pickled.
> Please stay on the sklearn mailing list, I might not have time to
> reply.
>
> Andy
>
>
> On 08/03/2017 01:24 PM, Sam Barnett wrote:
>> Hi Andy,
>>
>> I've since tried a different solution: instead of a pipeline,
>> I've simply created a classifier that is for the most part like
>> svm.SVC, though it takes a few extra inputs for the
>> sequentialisation step. I've used a Python function that can
>> compute the Gram matrix between two datasets of any shape to pass
>> into SVC(), though I'm now having trouble with pickling on the
>> check_estimator test. It appears that SeqSVC.fit() doesn't like
>> to have methods defined within it. Can you see how to pass this
>> test? (the .ipynb file shows the error).
>>
>> Best,
>> Sam
>>
>> On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett
>> <sambarnett95 at gmail.com <mailto:sambarnett95 at gmail.com>> wrote:
>>
>> You're right: it does fail without GridSearchCV when I change
>> the size of seq_test. I will look at the transform tomorrow
>> to see if I can work this out. Thank you for your help so far!
>>
>> On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller
>> <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>
>> Change the size of seq_test in your notebook and you'll
>> see the failure without GridSearchCV.
>> I haven't looked at your code in detail, but transform is
>> supposed to work on arbitrary new data with the same
>> number of features.
>> Your code requires the test data to have the same shape
>> as the training data.
>> Cross-validation will lead to training data and test data
>> having different sizes. But I feel like something is
>> already wrong if your
>> test data size depends on your training data size.
>>
>>
>>
>> On 08/02/2017 03:08 PM, Sam Barnett wrote:
>>> Hi Andy,
>>>
>>> The purpose of the transformer is to take an ordinary
>>> kernel (in this case I have taken 'rbf' as a default)
>>> and return a 'sequentialised' kernel using a few extra
>>> parameters. Hence, the transformer takes an ordinary
>>> data-target pair X, y as its input, and the
>>> fit_transform(X, y) method will output the Gram matrix
>>> for X that is associated with this sequentialised
>>> kernel. In the pipeline, this Gram matrix is passed into
>>> an SVC classifier with the kernel parameter set to
>>> 'precomputed'.
>>>
>>> Therefore, I do not think your hacky solution would be
>>> possible. However, I am still unsure how to implement
>>> your first solution: won't the Gram matrix from the
>>> transformer contain all the necessary kernel values?
>>> Could you elaborate further?
>>>
>>>
>>> Best,
>>> Sam
>>>
>>> On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller
>>> <t3kcit at gmail.com <mailto:t3kcit at gmail.com>> wrote:
>>>
>>> Hi Sam.
>>> GridSearchCV will do cross-validation, which
>>> requires to "transform" the test data.
>>> The shape of the test-data will be different from
>>> the shape of the training data.
>>> You need to have the ability to compute the kernel
>>> between the training data and new test data.
>>>
>>> A more hacky solution would be to compute the full
>>> kernel matrix in advance and pass that to GridSearchCV.
>>>
>>> You probably don't need it here, but you should also
>>> checkout what the _pairwise attribute does in
>>> cross-validation,
>>> because that it likely to come up when playing with
>>> kernels.
>>>
>>> Hth,
>>> Andy
>>>
>>>
>>> On 08/02/2017 08:38 AM, Sam Barnett wrote:
>>>> Dear all,
>>>>
>>>> I have created a 2-step pipeline with a custom
>>>> transformer followed by a simple SVC classifier,
>>>> and I wish to run a grid-search over it. I am able
>>>> to successfully create the transformer and the
>>>> pipeline, and each of these elements work fine.
>>>> However, when I try to use the fit() method on my
>>>> GridSearchCV object, I get the following error:
>>>>
>>>> 57 # during fit.
>>>> 58 if X.shape != self.input_shape_:
>>>> ---> 59 raise ValueError('Shape of
>>>> input is different from what was seen '
>>>> 60 'in `fit`')
>>>> 61
>>>>
>>>> ValueError: Shape of input is different from what
>>>> was seen in `fit`
>>>>
>>>> For a full breakdown of the problem, I have written
>>>> a Jupyter notebook showing exactly how the error
>>>> occurs (this also contains all .py files necessary
>>>> to run the notebook). Can anybody see how to work
>>>> through this?
>>>>
>>>> Many thanks,
>>>> Sam Barnett
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> <mailto:scikit-learn at python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170804/c5ffbc05/attachment-0001.html>
More information about the scikit-learn
mailing list