Fwd: inconsistency between libsvm and scikit-learn.svc results
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/~cjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture. The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm. When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation. The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file. Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version. I wound appreciate any help in addressing this issue. size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866
On 08/27/2016 12:33 PM, elgesto@gmail.com wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by |load_svmlight_file| routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
|size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866|
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream.
Can I update the libsvm version by myself? 2016-08-27 12:49 GMT+03:00 olologin <olologin@gmail.com>:
On 08/27/2016 12:33 PM, elgesto@gmail.com wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 08/27/2016 02:19 PM, elgesto@gmail.com wrote:
Can I update the libsvm version by myself?
2016-08-27 12:49 GMT+03:00 olologin <olologin@gmail.com <mailto:olologin@gmail.com>>:
On 08/27/2016 12:33 PM, elgesto@gmail.com <mailto:elgesto@gmail.com> wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by |load_svmlight_file| routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
|size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866|
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I don't think it is so easy, version which is used in scikit-learn has many additional modifications. from header of svm.cpp: /* Modified 2010: - Support for dense data by Ming-Fang Weng - Return indices for support vectors, Fabian Pedregosa <fabian.pedregosa@inria.fr> - Fixes to avoid name collision, Fabian Pedregosa - Add support for instance weights, Fabian Pedregosa based on work by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu, <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances>. - Make labels sorted in svm_group_classes, Fabian Pedregosa. */
So there is no possibility to reach a consistency? 2016-08-27 15:36 GMT+03:00 olologin <olologin@gmail.com>:
On 08/27/2016 02:19 PM, elgesto@gmail.com wrote:
Can I update the libsvm version by myself?
2016-08-27 12:49 GMT+03:00 olologin <olologin@gmail.com>:
On 08/27/2016 12:33 PM, elgesto@gmail.com wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailma n/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
I don't think it is so easy, version which is used in scikit-learn has many additional modifications.
from header of svm.cpp: /* Modified 2010: - Support for dense data by Ming-Fang Weng - Return indices for support vectors, Fabian Pedregosa <fabian.pedregosa@inria.fr> <fabian.pedregosa@inria.fr> - Fixes to avoid name collision, Fabian Pedregosa - Add support for instance weights, Fabian Pedregosa based on work by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu, <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_ for_data_instances> <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances>. - Make labels sorted in svm_group_classes, Fabian Pedregosa. */
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
I don't think we should assume that this is the only possible reason for inconsistency. Could you give us a small snippet of data and code on which you find this inconsistency? On 27 August 2016 at 23:42, elgesto@gmail.com <elgesto@gmail.com> wrote:
So there is no possibility to reach a consistency?
2016-08-27 15:36 GMT+03:00 olologin <olologin@gmail.com>:
On 08/27/2016 02:19 PM, elgesto@gmail.com wrote:
Can I update the libsvm version by myself?
2016-08-27 12:49 GMT+03:00 olologin <olologin@gmail.com>:
On 08/27/2016 12:33 PM, elgesto@gmail.com wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream. _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailma n/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
I don't think it is so easy, version which is used in scikit-learn has many additional modifications.
from header of svm.cpp: /* Modified 2010: - Support for dense data by Ming-Fang Weng - Return indices for support vectors, Fabian Pedregosa <fabian.pedregosa@inria.fr> <fabian.pedregosa@inria.fr> - Fixes to avoid name collision, Fabian Pedregosa - Add support for instance weights, Fabian Pedregosa based on work by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu, <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_ data_instances> <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances>. - Make labels sorted in svm_group_classes, Fabian Pedregosa. */
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 08/27/2016 09:48 AM, Joel Nothman wrote:
I don't think we should assume that this is the only possible reason for inconsistency. Could you give us a small snippet of data and code on which you find this inconsistency?
I would also expect different settings or random states or data preparation to be more likely culprits.
Any chance it's related to the seed issue in the "Decoding Differences Between SKL SVM and Matlab Libsvm Even When Parameters the Same" thread? Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Sun, Aug 28, 2016 at 12:20 PM, Andy <t3kcit@gmail.com> wrote:
On 08/27/2016 09:48 AM, Joel Nothman wrote:
I don't think we should assume that this is the only possible reason for inconsistency. Could you give us a small snippet of data and code on which you find this inconsistency?
I would also expect different settings or random states or data preparation to be more likely culprits.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
One logical possibility is if svm would accept the scikit-learn changes. On 8/27/16 6:42 AM, elgesto@gmail.com wrote:
So there is no possibility to reach a consistency?
2016-08-27 15:36 GMT+03:00 olologin <olologin@gmail.com <mailto:olologin@gmail.com>>:
On 08/27/2016 02:19 PM, elgesto@gmail.com <mailto:elgesto@gmail.com> wrote:
Can I update the libsvm version by myself?
2016-08-27 12:49 GMT+03:00 olologin <olologin@gmail.com <mailto:olologin@gmail.com>>:
On 08/27/2016 12:33 PM, elgesto@gmail.com <mailto:elgesto@gmail.com> wrote:
I have a project that is based on SVM algorithm implemented by libsvm <https://www.csie.ntu.edu.tw/%7Ecjlin/libsvm/>. Recently I decided to try several other classification algorithm, this is where scikit-learn <http://scikit-learn.org/> comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by |load_svmlight_file| routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course. I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
|size libsvm scikit-learn 1 0.1336239435355727 0.1336239435355727 2 0.08699516468193455 0.08699516468193455 3 0.32928301642777424 0.2117238289550198 #different 4 0.2835688734876902 0.2835688734876902 5 0.27846766962743097 0.26651875338163966 #different 6 0.2853854654662907 0.18898048915599963 #different 7 0.28196058132165136 0.28196058132165136 8 0.31473956032575623 0.1958710201604552 #different 9 0.33588303670653136 0.2101641630182972 #different 10 0.4075242509025311 0.2997807499800962 #different 15 0.4391771087975972 0.4391771087975972 20 0.3837789445609818 0.2713167833345173 #different 25 0.4252154334940311 0.4252154334940311 30 0.4256407777477492 0.4256407777477492 35 0.45314944605858387 0.45314944605858387 40 0.4278633233755064 0.4278633233755064 45 0.46174762022239796 0.46174762022239796 50 0.45370452524846866 0.45370452524846866|
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
This might be because current version of libsvm used in scikit is 3.10 from 2011. With some patch imported from upstream.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
I don't think it is so easy, version which is used in scikit-learn has many additional modifications.
from header of svm.cpp: /* Modified 2010: - Support for dense data by Ming-Fang Weng - Return indices for support vectors, Fabian Pedregosa <fabian.pedregosa@inria.fr> <mailto:fabian.pedregosa@inria.fr> - Fixes to avoid name collision, Fabian Pedregosa - Add support for instance weights, Fabian Pedregosa based on work by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu, <http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances> <http://www.csie.ntu.edu.tw/%7Ecjlin/libsvmtools/#weights_for_data_instances>. - Make labels sorted in svm_group_classes, Fabian Pedregosa. */
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (6)
-
Andy -
Bill Ross -
elgesto@gmail.com -
Joel Nothman -
Michael Bommarito -
olologin