[scikit-learn] R user trying to learn Python

Sebastian Raschka se.raschka at gmail.com
Sun Jun 18 12:53:03 EDT 2017


Hi,

> I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
> 
> code 1:
> y_sin = np.sin(x)
> y_cos = np.cos(x)
> 
> I know you can import the entire package without the "as np", but I see np.something as the standard. Why?

Because it makes it clear where this function is coming from. Sure, you could do 

from numpy import *

but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided. 

> In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
> 
> Can someone explain the mentality behind this setup?

This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers. 

Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...

> What are some good ways and resources to learn Python for data analysis?

I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.

Best,
Sebastian

> On Jun 18, 2017, at 12:02 PM, C W <tmrsg11 at gmail.com> wrote:
> 
> Dear Scikit-learn,
> 
> What are some good ways and resources to learn Python for data analysis?
> 
> I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
> 
> code 1:
> y_sin = np.sin(x)
> y_cos = np.cos(x)
> 
> I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
> 
> Code 2:
> model = LogisticRegression()
> model.fit(X_train, y_train)
> model.score(X_test, y_test)
> 
> In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
> 
> Code 3:
> from sklearn import linear_model
> reg = linear_model.Ridge (alpha = .5)
> reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
> 
> In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
> 
> Can someone explain the mentality behind this setup?
> 
> Thank you very much!
> 
> M
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list