[scikit-learn] 答复: 答复: question about using sklearn.neural_network.MLPClassifier?

Sebastian Raschka se.raschka at gmail.com
Fri Nov 25 13:57:43 EST 2016


> many of them need number of outlier and distance as input parameter in advance, is there algorithm more intelligently ?

With ‘intelligently’ you mean ‘more automatic’ (fewer hyperparameters to define manually)? In my opinion, “outlier” is a highly context-specific definition, thus, it’s really up to you to decide what to count as an outlier or not for your application.

E.g., a simple non-parametric approach would be to say that point P is an outlier if

P > Q3 + 1.5 * IQR, 
or P < Q1  - 1.5 * IQR

where Q1 and Q3 are the first and third quartile of the dataset, respectively, and IQR = interquartile range (Q3-Q1). Similarly you could use thresholds based on variance or standard deviation, etc. so that you don’t need to specify the number of outliers if that’s not what you want

> On Nov 25, 2016, at 6:38 AM, linjia at ruijie.com.cn wrote:
> 
> Hello everyone, 
>      I use ' IsolationForest' to pick up the outlier data today and I notice there is a ' contamination ' parameter in IsolationForest function, and its default value is 0.1 = 10%
>      So is there a way to pick the outlier without assigning the proportion of outliers in the data set?
>      For example, in dataset [2,3,2,4,2,3,1,2,3,1,2, 999, 2,3,2,1,2,3], we can easily pick the '999' as an outlier entry out of the set according to the consciousness
>      And I read some paper about outlier detect recently, many of them need number of outlier and distance as input parameter in advance, is there algorithm more intelligently ?
> 
> 
> 
> 
> 
> -----邮件原件-----
> 发件人: scikit-learn [mailto:scikit-learn-bounces+linjia=ruijie.com.cn at python.org] 代表 Sebastian Raschka
> 发送时间: 2016年11月25日 10:51
> 收件人: Scikit-learn user and developer mailing list
> 主题: Re: [scikit-learn] 答复: question about using sklearn.neural_network.MLPClassifier?
> 
>> here is another question, when I use neural network lib routine, can I save the trained network for use at the next time?
> 
> 
> Maybe have a look at the model persistence section at http://scikit-learn.org/stable/modules/model_persistence.html or http://cmry.github.io/notes/serialize
> 
> Cheers,
> Sebastian
> 
> 
>> On Nov 24, 2016, at 8:08 PM, linjia at ruijie.com.cn wrote:
>> 
>> @ Sebastian Raschka
>> thanks for your analyzing ,
>> here is another question, when I use neural network lib routine, can I save the trained network for use at the next time?
>> Just like the following:
>> 
>> Foo1.py
>>>> Clf.fit(x,y)
>> Result_network = clf.save()
>>>> 
>> Foo2.py
>>>> Clf = Load(result_network)
>> Res = Clf.predict(newsample)
>>>> 
>> So I needn’t fit the train-set everytime
>> 发件人: scikit-learn 
>> [mailto:scikit-learn-bounces+linjia=ruijie.com.cn at python.org] 代表 
>> Sebastian Raschka
>> 发送时间: 2016年11月24日 3:06
>> 收件人: Scikit-learn user and developer mailing list
>> 主题: Re: [scikit-learn] question about using 
>> sklearn.neural_network.MLPClassifier?
>> 
>> If you keep everything at their default values, it seems to work -
>> 
>> ```py
>> from sklearn.neural_network import MLPClassifier X = [[0, 0], [0, 1], 
>> [1, 0], [1, 1]] y = [0, 1, 1, 0] clf = MLPClassifier(max_iter=1000) 
>> clf.fit(X, y) res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
>> print(res)
>> ```
>> 
>> The default is set 100 units in the hidden layer, but theoretically, it should work with 2 hidden logistic units (I think that’s the typical textbook/class example). I think what happens is that it gets stuck in local minima depending on the random weight initialization. E.g., the following works just fine:
>> 
>> from sklearn.neural_network import MLPClassifier X = [[0, 0], [0, 1], 
>> [1, 0], [1, 1]] y = [0, 1, 1, 0] clf = MLPClassifier(solver='lbfgs',
>>                    activation='logistic', 
>>                    alpha=0.0, 
>>                    hidden_layer_sizes=(2,),
>>                    learning_rate_init=0.1,
>>                    max_iter=1000,
>>                    random_state=20)
>> clf.fit(X, y)
>> res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
>> print(res)
>> print(clf.loss_)
>> 
>> 
>> but changing the random seed to 1 leads to:
>> 
>> [0 1 1 1]
>> 0.34660921283
>> 
>> For comparison, I used a more vanilla MLP (1 hidden layer with 2 units and logistic activation as well; https://github.com/rasbt/python-machine-learning-book/blob/master/code/ch12/ch12.ipynb), essentially resulting in the same problem:
>> 
>> 
>> <image001.png><image002.png>
>> 
>> 
>> 
>> 
>> On Nov 23, 2016, at 6:26 AM, linjia at ruijie.com.cn wrote:
>> 
>> Yes,you are right @ Raghav R V, thx!
>> 
>> However, i found the key param is ‘hidden_layer_sizes=[2]’,  I wonder if I misunderstand the meaning of parameter of hidden_layer_sizes?
>> 
>> Is  it related to the topic : 
>> http://stackoverflow.com/questions/36819287/mlp-classifier-of-scikit-n
>> euralnetwork-not-working-for-xor
>> 
>> 
>> 发件人: scikit-learn 
>> [mailto:scikit-learn-bounces+linjia=ruijie.com.cn at python.org] 代表 
>> Raghav R V
>> 发送时间: 2016年11月23日 19:04
>> 收件人: Scikit-learn user and developer mailing list
>> 主题: Re: [scikit-learn] question about using 
>> sklearn.neural_network.MLPClassifier?
>> 
>> Hi,
>> 
>> If you keep everything at their default values, it seems to work -
>> 
>> ```py
>> from sklearn.neural_network import MLPClassifier X = [[0, 0], [0, 1], 
>> [1, 0], [1, 1]] y = [0, 1, 1, 0] clf = MLPClassifier(max_iter=1000) 
>> clf.fit(X, y) res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
>> print(res)
>> ```
>> 
>> On Wed, Nov 23, 2016 at 10:27 AM, <linjia at ruijie.com.cn> wrote:
>> Hi everyone
>> 
>>      I try to use sklearn.neural_network.MLPClassifier to test the XOR operation, but I found the result is not satisfied. The following is code, can you tell me if I use the lib incorrectly?
>> 
>> from sklearn.neural_network import MLPClassifier X = [[0, 0], [0, 1], 
>> [1, 0], [1, 1]] y = [0, 1, 1, 0] clf = MLPClassifier(solver='adam', 
>> activation='logistic', alpha=1e-3, hidden_layer_sizes=(2,), 
>> max_iter=1000) clf.fit(X, y) res = clf.predict([[0, 0], [0, 1], [1, 
>> 0], [1, 1]])
>> print(res)
>> 
>> 
>> #result is [0 0 0 0], score is 0.5
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> 
>> 
>> 
>> --
>> Raghav RV
>> https://github.com/raghavrv
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list