R user trying to learn Python
Dear Scikit-learn, What are some good ways and resources to learn Python for data analysis? I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient. code 1: y_sin = np.sin(x) y_cos = np.cos(x) I know you can import the entire package without the "as np", but I see np.something as the standard. Why? Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test) In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know. Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in? Can someone explain the mentality behind this setup? Thank you very much! M
Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do from numpy import * but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers. Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions. Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Sebastian, I looked through your book. I think it is great if you already know Python, and looking to learn machine learning. For me, I have some sense of machine learning, but none of Python. Unlike R, which is specifically for statistics analysis. Python is broad! Maybe some expert here with R can tell me how to go about this. :) On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do
from numpy import *
but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi, C W, yeah I'd say that Python is a programming language with lots of packages for scientific computing, whereas R is more of a toolbox for stats. Thus, Python may be a bit weird at first for people who come from the R/stats field and are new to programming. Not sure if it is necessary to learn programming & computer science basics for a person who is primarily interested in in stats and ML, but since so many tools are Python-based and require some sort of basic programming to fit the pieces together, it's maybe not a bad idea :). There's probably an over-abundance of python intro books out there ... However, I'd maybe recommend a introduction to computer science book that uses Python as a teaching language rather than a book that is just about Python language. Maybe check out https://www.udacity.com/course/intro-to-computer-science--cs101, which is a Python-based computer science course (and should be free). Best, Sebastian
On Jun 18, 2017, at 4:18 PM, C W <tmrsg11@gmail.com> wrote:
Hi Sebastian,
I looked through your book. I think it is great if you already know Python, and looking to learn machine learning.
For me, I have some sense of machine learning, but none of Python.
Unlike R, which is specifically for statistics analysis. Python is broad!
Maybe some expert here with R can tell me how to go about this. :)
On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka <se.raschka@gmail.com> wrote: Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do
from numpy import *
but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
CW you might want to read http://greenteapress.com/wp/think-python/ (available as free pdf) (for basics of programming and python) and Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython, O'reilly (for data analysis libraries: pandas, numpy, ipython...) On Sun, Jun 18, 2017 at 10:18 PM, C W <tmrsg11@gmail.com> wrote:
Hi Sebastian,
I looked through your book. I think it is great if you already know Python, and looking to learn machine learning.
For me, I have some sense of machine learning, but none of Python.
Unlike R, which is specifically for statistics analysis. Python is broad!
Maybe some expert here with R can tell me how to go about this. :)
On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do
from numpy import *
but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hello, The concepts behind R and python are entirely different. Python is meant to be as explicit as possible, and uses the concepts of namespace which R doesn't. While it can seem that python code is more verbose, it is very clear when reading python code which functions come from which module and submodule (this is link to your code 1 and code 3 examples). For example 2, R indeed saves everything to a variable, while python does not. The advantage is that Python is much more time and memory efficient than R. The tradeoff is that you do not keep intermediate results. Hope that explains, N On 18 June 2017 at 13:18, C W <tmrsg11@gmail.com> wrote:
Hi Sebastian,
I looked through your book. I think it is great if you already know Python, and looking to learn machine learning.
For me, I have some sense of machine learning, but none of Python.
Unlike R, which is specifically for statistics analysis. Python is broad!
Maybe some expert here with R can tell me how to go about this. :)
On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do
from numpy import *
but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Thank you all for the love! Sean, I think your recommendation is perfect! It covers everything, very concise, to the point. Sebastian, I will certainly invest time into that course when I have time. Nelle, I agree! And from what I read, thee head(), tail(), and data.frame() in Python actually came from R at request. Hence, I came to think they are similar. For anyone else in the world reading, I think pandas doc is also good: http://pandas.pydata.org/pandas-docs/stable/pandas.pdf Mike On Sun, Jun 18, 2017 at 4:37 PM, Nelle Varoquaux <nelle.varoquaux@gmail.com> wrote:
Hello,
The concepts behind R and python are entirely different. Python is meant to be as explicit as possible, and uses the concepts of namespace which R doesn't. While it can seem that python code is more verbose, it is very clear when reading python code which functions come from which module and submodule (this is link to your code 1 and code 3 examples).
For example 2, R indeed saves everything to a variable, while python does not. The advantage is that Python is much more time and memory efficient than R. The tradeoff is that you do not keep intermediate results.
Hope that explains, N
Hi Sebastian,
I looked through your book. I think it is great if you already know Python, and looking to learn machine learning.
For me, I have some sense of machine learning, but none of Python.
Unlike R, which is specifically for statistics analysis. Python is broad!
Maybe some expert here with R can tell me how to go about this. :)
On Sun, Jun 18, 2017 at 12:53 PM, Sebastian Raschka < se.raschka@gmail.com> wrote:
Hi,
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line.
It's
not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Because it makes it clear where this function is coming from. Sure, you could do
from numpy import *
but this is NOT!!! recommended. The reason why this is not recommended is that it would clutter up your main name space. For instance, numpy has its own sum function. If you do from numpy import *, Python's in-built `sum` will be gone from your main name space and replaced by NumPy's sum. This is confusing and should be avoided.
In the code above, sklearn > linear_model > Ridge, one lives inside
other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
This is one way to organize your code and package. Sklearn contains many things, and organizing it by subpackages (linear_model, svm, ...) makes only sense; otherwise, you would end up with code files > 100,000 lines or so, which would make life really hard for package developers.
Here, scikit-learn tries to follow the core principles of good object oriented program design, for instance, Abstraction, encapsulation, modularity, hierarchy, ...
What are some good ways and resources to learn Python for data analysis?
I think baed on your questions, a good resource would be an introduction to programming book or course. I think that sections on objected oriented programming would make the rationale/design/API of scikit-learn and Python classes as a whole more accessible and address your concerns and questions.
Best, Sebastian
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside
On 18 June 2017 at 13:18, C W <tmrsg11@gmail.com> wrote: the the
other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Another reference that I like a lot for people who already know a programming language and are trying to learn Python is "Python Essential Reference" by David Beazley. It gives a good understanding of how Python works, though it does not talk about numerical computing libraries. Gaël
Hi, along with all the great tips you received, perhaps you may find this useful: http://www.cert.org/flocon/2011/matlab-python-xref.pdf I know is not on-topic with your question, but I found it very useful when I start to use python (coming from R) So I thought it was worth to post it here. It is very old but those basic functions are pretty stable. The python code assumes a: from numpy import * which others already explained you why is good practice to avoid it, —Massimo.
On Jun 18, 2017, at 12:02 PM, C W <tmrsg11@gmail.com> wrote:
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi M, I think what you describe can be summarized as the difference of a domain specific language (r) and a general purpose language (Python). Most of what you describe is related to namespaces - "one honking great" feature of python. Namespaces are less needed in r because r is domain specific. But if you write your webserver's frontend, database access, prediction engine, user authentication, and what not all in Python (or at least large part of it), then namespaces help a lot keeping those domains apart. I also added a couple of more specific answers to your points below, but I somehow can't make them appear as "not reply". I hope you all find them. Hope that helps, Ingo
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
This is mostly the Python way to do namespaces. Although it may not be efficient when you type, it is efficient when you debug: you always get both function/method *and* the context in which it was executed.
code 1:
y_sin = np.sin(x)
y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Imagine you were doing an analysis for the Catholic church. Obviously sins would play and important role. So there might be a function that's called "sin" somewhere that does something entirely different from a trigonometric function. Ok, maybe this is a bad example but you get the idea. In this case it might even be a real issue because math.sin and numpy.sin do different but similar things. That could be difficult to debug and it's handy to mark which one you are using where.
Code 2:
model = LogisticRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
- That's right. You would not know. This is a design decision in sklearn. There are advantages and disadvantages to it. Sklearn is using stateful objects here. For those you would expect to change them by calling their methods. Note though, that the methods you call on your model also return values that are likely what you expect them to return.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
- Again, this is the namespace idea. Python allows you to group functions, classes, and even namespaces themselves in namespaces. For larger packages, this can be very useful because you can structure your code accordingly.
Hi, You may find these R/Python comparison-sheets useful in understanding both languages syntaxes and concepts: * https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis * http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html Gaël, Le 18/06/2017 à 18:02, C W a écrit :
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Makina Corpus <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
And, answering your last question, a good way to learn Data science using Python is, for I, "Python data science handbook" that you can read as Jupyter notebooks: https://github.com/jakevdp/PythonDataScienceHandbook Le 20/06/2017 à 06:28, Gaël Pegliasco via scikit-learn a écrit :
Hi,
You may find these R/Python comparison-sheets useful in understanding both languages syntaxes and concepts:
* https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis * http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html
Gaël,
Le 18/06/2017 à 18:02, C W a écrit :
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Makina Corpus <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus>
Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Makina Corpus <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
I am catching up to all the replies, apologies for the delay. (replied in reverse order) @ Gaël, Thanks for your comments. I actually started with 1) Data Camp courses and 2) Python for Data Science book. Here's my review: 1) The course: it is fantastic! But they only give you a flavor of A FEW things. 2) The book: it is quick crash course, but not enough for you to take off. See code below. # Toy Python Code import numpy as np import pandas as pd N = 100 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01',periods=N,freq='D'), 'x': np.linspace(0,stop=N-1,num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low','Medium','High'],N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() }) df.x len(dir(df)) # end of Python code My confusion: a) df.x gives you column x, but why, I thought things after dot are actions, or more like verbs performed on the object, namely df, in this case. b) len(dir(df)) gives 431. I only crated a dataframe, where did all these 431 things come from? Is there a documentation about this? It scares me because I only asked for a dataframe. @ Gael This is a pretty solid reference. It explains methods among other things, which is awesome! I think method is the barrier to entry for R users. @ Mail Thanks for the details, I will try to pick these computer science terminologies up. It has been a brutal week. @Massimo Yes, I have used that. It is indeed great for one to one equivalence reference. Thanks! On Tue, Jun 20, 2017 at 12:32 AM, Gaël Pegliasco via scikit-learn < scikit-learn@python.org> wrote:
And, answering your last question, a good way to learn Data science using Python is, for I, "Python data science handbook" that you can read as Jupyter notebooks:
https://github.com/jakevdp/PythonDataScienceHandbook
Le 20/06/2017 à 06:28, Gaël Pegliasco via scikit-learn a écrit :
Hi,
You may find these R/Python comparison-sheets useful in understanding both languages syntaxes and concepts:
- https://www.datacamp.com/community/tutorials/r-or- python-for-data-analysis - http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html
Gaël,
Le 18/06/2017 à 18:02, C W a écrit :
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
-- [image: Makina Corpus] <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
-- [image: Makina Corpus] <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
You can also have a look at "Effective Computation in Physics" by Anthony Scopatz and Kathryn D. Huff. It gives a very good overview of Python/numpy/pandas... Albert Thomas On Tue, 20 Jun 2017 at 07:25, C W <tmrsg11@gmail.com> wrote:
I am catching up to all the replies, apologies for the delay. (replied in reverse order)
@ Gaël, Thanks for your comments. I actually started with 1) Data Camp courses and 2) Python for Data Science book.
Here's my review: 1) The course: it is fantastic! But they only give you a flavor of A FEW things. 2) The book: it is quick crash course, but not enough for you to take off. See code below.
# Toy Python Code import numpy as np import pandas as pd
N = 100 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01',periods=N,freq='D'), 'x': np.linspace(0,stop=N-1,num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low','Medium','High'],N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() }) df.x len(dir(df)) # end of Python code
My confusion: a) df.x gives you column x, but why, I thought things after dot are actions, or more like verbs performed on the object, namely df, in this case. b) len(dir(df)) gives 431. I only crated a dataframe, where did all these 431 things come from? Is there a documentation about this? It scares me because I only asked for a dataframe.
@ Gael This is a pretty solid reference. It explains methods among other things, which is awesome! I think method is the barrier to entry for R users.
@ Mail Thanks for the details, I will try to pick these computer science terminologies up. It has been a brutal week.
@Massimo Yes, I have used that. It is indeed great for one to one equivalence reference.
Thanks!
On Tue, Jun 20, 2017 at 12:32 AM, Gaël Pegliasco via scikit-learn < scikit-learn@python.org> wrote:
And, answering your last question, a good way to learn Data science using Python is, for I, "Python data science handbook" that you can read as Jupyter notebooks:
https://github.com/jakevdp/PythonDataScienceHandbook
Le 20/06/2017 à 06:28, Gaël Pegliasco via scikit-learn a écrit :
Hi,
You may find these R/Python comparison-sheets useful in understanding both languages syntaxes and concepts:
- https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis - http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html
Gaël,
Le 18/06/2017 à 18:02, C W a écrit :
Dear Scikit-learn,
What are some good ways and resources to learn Python for data analysis?
I am extremely frustrated using this thing. Everything comes after a dot! Why would you type the sam thing at the beginning of every line. It's not efficient.
code 1: y_sin = np.sin(x) y_cos = np.cos(x)
I know you can import the entire package without the "as np", but I see np.something as the standard. Why?
Code 2: model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)
In R, everything is saved to a variable. In the code above, what if I accidentally ran model.fit(), I would not know.
Code 3: from sklearn import linear_model reg = linear_model.Ridge (alpha = .5) reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
In the code above, sklearn > linear_model > Ridge, one lives inside the other, it feels that there are multiple layer, how deep do I have to dig in?
Can someone explain the mentality behind this setup?
Thank you very much!
M
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
-- [image: bckoegajjgeobgik.png] <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
-- [image: demfflofhelojfjn.png] <http://makina-corpus.com> Newsletters <http://makina-corpus.com/formulaires/newsletters> | Formations <http://makina-corpus.com/formation> | Twitter <https://twitter.com/makina_corpus> Gaël Pegliasco Chef de projets Tél : 02 51 79 80 84 Portable : 06 41 69 16 09 11 rue du Marchix FR-44000 Nantes -- @GPegliasco <https://twitter.com/GPegliasco> -- Découvrez Talend Data Integration <http://makina-corpus.com/formation/etl-talend-open-studio>, LA solution d'intégration de données Open Source
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (9)
-
Albert Thomas -
C W -
Gael Varoquaux -
Gaël Pegliasco -
mail -
massimo di stefano -
Nelle Varoquaux -
Sean Violante -
Sebastian Raschka