From stephen_malcolm at hotmail.com Sun Oct 4 17:14:23 2020 From: stephen_malcolm at hotmail.com (Stephen Malcolm) Date: Sun, 4 Oct 2020 21:14:23 +0000 Subject: [scikit-learn] Rerunning Kmeans with Python Message-ID: Hello all, I've written some code to run Kmeans on a data set (please see below). And I've plotted the results, with my two clusters/ centroids. However, I've to re-run Kmeans several times and pull up different plots (showing the different centroid positions). Can someone point me in the right direction how to write this extra code to perform this task? Then I've to conclude if Kmeans is stable. I believe this is the lowest sum of squared errors? Thanking you in advance. #pandas used to read dataset and return the data #numpy and matplotlib to represent and visualize the data #sklearn to implement kmeans algorithm import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler #import the data data = pd.read_csv('file.csv') #extract values x=data['V1'] y=data['V2'] V1_V2 = np.column_stack ((V1, V2)) km_res = KMeans (n_clusters= 2).fit(V1_V2) y_kmeans = km_res.predict(V1_V2) plt.scatter(V1, V2, c=y_kmeans, cmap='viridis', s = 50, alpha = 0.5) plt.xlabel('V1') plt.ylabel('V2') plt.title('Visualization of raw data'); clusters = km_res.cluster_centers_ plt.scatter(clusters[:,0], clusters[:,1], c='blue', s=150) Get Outlook for iOS -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen_malcolm at hotmail.com Mon Oct 5 02:56:30 2020 From: stephen_malcolm at hotmail.com (Stephen Malcolm) Date: Mon, 5 Oct 2020 06:56:30 +0000 Subject: [scikit-learn] Rerunning Kmeans with Python In-Reply-To: References: Message-ID: Hello again, I've added some extra code to plot 6 smaller graphs, which should change the initialization of the centroids. The title of the plots should be the sum of the squared distance of each initialization (which should tell me when they converge). The smaller graphs appear (unlabelled) however, my data is not being plotted. Can someone tell me where I'm going wrong? n_iter = 9 fig, ax = plt.subplots(3, 3, figsize=(16, 16)) ax = np.ravel(ax) centers = [] for i in range(n_iter): # Run local implementation of kmeans max_iter=3, random_state=np.random.randint(0, 1000, size=1) clusters = km.clusters ax[i].scatter([km.labels == 0, 0], [km.labels == 0, 1], c='green', label='cluster 1') ax[i].scatter([km.labels == 1, 0], [km.labels == 1, 1], c='blue', label='cluster 2') ax[i].scatter(clusters[:, 0], clusters[:, 1], c='r', marker='*', s=300, label='centroid') ax[i].set_xlim([-2, 2]) ax[i].set_ylim([-2, 2]) ax[i].legend(loc='lower right') ax[i].set_title(f'{km.error:.4f}') ax[i].set_aspect('equal') plt.tight_layout(); Thanks... ________________________________ From: Stephen Malcolm Sent: 04 October 2020 21:14 To: scikit-learn at python.org Subject: Rerunning Kmeans with Python Hello all, I've written some code to run Kmeans on a data set (please see below). And I've plotted the results, with my two clusters/ centroids. However, I've to re-run Kmeans several times and pull up different plots (showing the different centroid positions). Can someone point me in the right direction how to write this extra code to perform this task? Then I've to conclude if Kmeans is stable. I believe this is the lowest sum of squared errors? Thanking you in advance. #pandas used to read dataset and return the data #numpy and matplotlib to represent and visualize the data #sklearn to implement kmeans algorithm import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler #import the data data = pd.read_csv('file.csv') #extract values x=data['V1'] y=data['V2'] V1_V2 = np.column_stack ((V1, V2)) km_res = KMeans (n_clusters= 2).fit(V1_V2) y_kmeans = km_res.predict(V1_V2) plt.scatter(V1, V2, c=y_kmeans, cmap='viridis', s = 50, alpha = 0.5) plt.xlabel('V1') plt.ylabel('V2') plt.title('Visualization of raw data'); clusters = km_res.cluster_centers_ plt.scatter(clusters[:,0], clusters[:,1], c='blue', s=150) Get Outlook for iOS -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Tue Oct 13 05:59:52 2020 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 13 Oct 2020 11:59:52 +0200 Subject: [scikit-learn] About the Boston housing prices dataset Message-ID: Hi all, Thanks to the sustained effort of several contributors (thanks Maria and Lucy in particular), the Boston housing price dataset is no longer used in the examples of scikit-learn (nor in the test suite) in the master branch. To give some context on why this dataset is problematic, please have a look at this discussion and the blog post linked in it: https://github.com/scikit-learn/scikit-learn/issues/16155 Now that we no longer use sklearn.datasets.load_boston internally, we have to make a decision about what to do with the loader function itself: deprecate it? just silently hide it from our documentation from our documentation (probably a bad idea)? keep it but educate our users about its ethical problem? Personally, I would be slightly in favor of the latter option and I drafted a short paragraph here: https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707601448 Please feel free to share your thoughts so that we can hopefully make a consensual decision before the 0.24 release. Regards, -- Olivier From jni at fastmail.com Tue Oct 13 06:37:46 2020 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Tue, 13 Oct 2020 21:37:46 +1100 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: I very much like your paragraph, Olivier. I might recommend additionally raising it as a warning when calling the data creation function. For reference, in scikit-image when we removed Lena we raised a warning and returned an alternative (the now-famous `data.astronaut()`) for two versions, before removing the image altogether. I think that was a good approach for us, but I like your preference of using the dataset as an educational opportunity in this case. I lean in that direction also, but with the caveat that I think the message should be included in a warning, not just in the docstring. Juan. > On 13 Oct 2020, at 8:59 pm, Olivier Grisel wrote: > > Hi all, > > Thanks to the sustained effort of several contributors (thanks Maria > and Lucy in particular), the Boston housing price dataset is no longer > used in the examples of scikit-learn (nor in the test suite) in the > master branch. > > To give some context on why this dataset is problematic, please have a > look at this discussion and the blog post linked in it: > > https://github.com/scikit-learn/scikit-learn/issues/16155 > > Now that we no longer use sklearn.datasets.load_boston internally, we > have to make a decision about what to do with the loader function > itself: deprecate it? just silently hide it from our documentation > from our documentation (probably a bad idea)? keep it but educate our > users about its ethical problem? > > Personally, I would be slightly in favor of the latter option and I > drafted a short paragraph here: > > https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707601448 > > Please feel free to share your thoughts so that we can hopefully make > a consensual decision before the 0.24 release. > > Regards, > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From olivier.grisel at ensta.org Tue Oct 13 07:59:31 2020 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 13 Oct 2020 13:59:31 +0200 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: Thanks for your input, this is also an extension I was thinking of. From adrin.jalali at gmail.com Tue Oct 13 10:17:16 2020 From: adrin.jalali at gmail.com (Adrin) Date: Tue, 13 Oct 2020 16:17:16 +0200 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: Isn't the Boston dataset available through openml? Maybe here: https://www.openml.org/d/531 I'm happy to have the dataset out there on opemml, and for any material that addresses some of the issues with it. But for educational purposes, we don't need to have the dataset in the package as long as users can still download it with a oneliner using fetch_openml. Cheers, Adrin On Tue, Oct 13, 2020 at 2:00 PM Olivier Grisel wrote: > Thanks for your input, this is also an extension I was thinking of. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed Oct 14 04:10:33 2020 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 14 Oct 2020 10:10:33 +0200 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: Le mar. 13 oct. 2020 ? 16:19, Adrin a ?crit : > > Isn't the Boston dataset available through openml? Maybe here: https://www.openml.org/d/531 > > I'm happy to have the dataset out there on opemml, and for any material that addresses some of the issues with it. > But for educational purposes, we don't need to have the dataset in the package as long as users can still download it > with a oneliner using fetch_openml. That would be an argument in favor of deprecation warning with a message stating the motivation for deprecation and pointing to fetch_openml. However it's going to break examples written in slow to update tutorials or book once the deprecation period is over. But one could argue that this is also the case for any other deprecation in scikit-learn. It's just that sklearn.datasets.load_boston is used A LOT: https://github.com/search?q=load_boston&type=code -- Olivier From adrin.jalali at gmail.com Wed Oct 14 04:34:22 2020 From: adrin.jalali at gmail.com (Adrin) Date: Wed, 14 Oct 2020 10:34:22 +0200 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: Most of those are not talking about the ethical issues of the dataset. Let's talk about the alternatives we have: Keep the loader, but raise a warning: - this will result in most people not changing their code/material, and IMO mostly ignore the warning. Some people may see the warning and care about it. Deprecate, and point them to an alternative dataset, and if they really really want the same dataset, point them to the openml ID: - People will have to change something, and if we give them a nice copy/paste-able alternative which is not boston, they'll use that instead. - Some people will keep using boston from openml, and not care about the ethical implications As an addition, we can keep the load_boston in the docs only, and point users to alternatives even after removing the loader. On Wed, Oct 14, 2020 at 10:11 AM Olivier Grisel wrote: > Le mar. 13 oct. 2020 ? 16:19, Adrin a ?crit : > > > > Isn't the Boston dataset available through openml? Maybe here: > https://www.openml.org/d/531 > > > > I'm happy to have the dataset out there on opemml, and for any material > that addresses some of the issues with it. > > But for educational purposes, we don't need to have the dataset in the > package as long as users can still download it > > with a oneliner using fetch_openml. > > That would be an argument in favor of deprecation warning with a > message stating the motivation for deprecation and pointing to > fetch_openml. > > However it's going to break examples written in slow to update > tutorials or book once the deprecation period is over. But one could > argue that this is also the case for any other deprecation in > scikit-learn. It's just that sklearn.datasets.load_boston is used A > LOT: https://github.com/search?q=load_boston&type=code > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lorentzen.ch at gmail.com Wed Oct 14 06:01:58 2020 From: lorentzen.ch at gmail.com (Christian Lorentzen) Date: Wed, 14 Oct 2020 12:01:58 +0200 Subject: [scikit-learn] About the Boston housing prices dataset In-Reply-To: References: Message-ID: Hi As was recently mentioned in PR #18594, the problem with the boston housing dataset does not go away, just because we remove it from scikit-learn. On the contrary, it is a valuable dataset to show and teach bias and discrimination - issue #16715 is still waiting for someone to write an example - in particular because we have access to the variable "B". Most, if not all, of the datasets in scikit-learn are available elsewhere, even in python. So I don't think this is a good argument either for removal. As we've now removed it from tests and examples, the question for me is: What do we want to achieve furthermore? Answers I can think of go down a political road... I'm fine with Olivier's suggestion https://github.com/scikit-learn/scikit-learn/pull/18594#issuecomment-707626543. All the best, Christian On 14.10.20 10:34, Adrin wrote: > Most of those are not talking about the ethical issues of the dataset. > Let's talk about the alternatives we have: > > Keep the loader, but raise a warning: > - this will result in most people not changing their code/material, > and IMO mostly ignore the warning. Some > people may see the warning and care about it. > > Deprecate, and point them to an alternative dataset, and if they > really really want the same dataset, point them > to the openml ID: > - People will have to change something, and if we give them a nice > copy/paste-able alternative which is not boston, > they'll use that instead. > - Some people will keep using boston from openml, and not care about > the ethical implications > > As an addition, we can keep the load_boston in the docs only, and > point users to alternatives even after removing > the loader. > > On Wed, Oct 14, 2020 at 10:11 AM Olivier Grisel > > wrote: > > Le mar. 13 oct. 2020 ? 16:19, Adrin > a ?crit : > > > > Isn't the Boston dataset available through openml? Maybe here: > https://www.openml.org/d/531 > > > > I'm happy to have the dataset out there on opemml, and for any > material that addresses some of the issues with it. > > But for educational purposes, we don't need to have the dataset > in the package as long as users can still download it > > with a oneliner using fetch_openml. > > That would be an argument in favor of deprecation warning with a > message stating the motivation for deprecation and pointing to > fetch_openml. > > However it's going to break examples written in slow to update > tutorials or book once the deprecation period is over. But one could > argue that this is also the case for any other deprecation in > scikit-learn. It's just that sklearn.datasets.load_boston is used A > LOT: https://github.com/search?q=load_boston&type=code > > -- > Olivier > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Tue Oct 20 12:42:14 2020 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Tue, 20 Oct 2020 18:42:14 +0200 Subject: [scikit-learn] scikit-learn monthly meeting October 26th 2020 Message-ID: Dear list, The next scikit-learn monthly meeting will take place on Monday October 26th at 11AM UTC: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=10&day=26&hour=11&min=0&sec=0&p1=179&p2=240&p3=195&p4=224 While these meetings are mainly for core-devs to discuss the current topics, we are also happy to welcome non-core devs and other project maintainers. Feel free to join, using the following link: https://meet.google.com/xhq-yoga-rtf If you plan to attend and you would like to discuss something specific about your contribution please add your name (or github pseudo) in the " Contributors " section, of the public pad: https://hackmd.io/47swZeY3STqn5rjtkS0mDg Best Chiara -------------- next part -------------- An HTML attachment was scrubbed... URL: From myabakhova at gmail.com Wed Oct 21 13:48:19 2020 From: myabakhova at gmail.com (Maiia Bakhova) Date: Wed, 21 Oct 2020 10:48:19 -0700 Subject: [scikit-learn] Numeric version of Elbow method for finding an optimal cluster number Message-ID: Hello, Sorry, this is not about scikit-learn. It is my method which I once offered here, and was told that it is too easy to code. Since then I have been talking to other people and learn that linguists do not understand enough math to work it out, and even people who are good at math would prefer a ready script. I wrote one and would like to share. I did a numeric implementation of the Elbow method for calculating the optimal cluster number. I will appecuat it greatly if you show my work here and people would give me feedback on it. https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means Best, Mya -- Maiia Bakhova Mathematician in Data Science http://myabakhova.blogspot.com https://www.linkedin.com/in/myabakhova -------------- next part -------------- An HTML attachment was scrubbed... URL: From julio at esbet.es Wed Oct 21 13:53:08 2020 From: julio at esbet.es (Julio Antonio Soto) Date: Wed, 21 Oct 2020 19:53:08 +0200 Subject: [scikit-learn] Numeric version of Elbow method for finding an optimal cluster number In-Reply-To: References: Message-ID: <7A9722F4-43EB-40C2-8CFC-E7236F6ECBE3@esbet.es> Hi Mya, To me, it looks very similar to the kneed library: https://github.com/arvkevi/kneed Best regards > El 21 oct 2020, a las 19:48, Maiia Bakhova escribi?: > > ? > Hello, > Sorry, this is not about scikit-learn. It is my method which I once offered here, and was told that > it is too easy to code. Since then I have been talking to other people and learn that linguists > do not understand enough math to work it out, and even people who are good at math would prefer > a ready script. I wrote one and would like to share. > I did a numeric implementation of the Elbow method for calculating the optimal cluster number. I will appecuat it greatly if you show my work here and people would give me feedback on it. > https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means > Best, > Mya > -- > Maiia Bakhova > Mathematician in Data Science > http://myabakhova.blogspot.com > https://www.linkedin.com/in/myabakhova > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From myabakhova at gmail.com Thu Oct 22 13:59:44 2020 From: myabakhova at gmail.com (Maiia Bakhova) Date: Thu, 22 Oct 2020 10:59:44 -0700 Subject: [scikit-learn] scikit-learn Digest, Vol 55, Issue 5 In-Reply-To: References: Message-ID: Hello Julio, Thanks for your feedback! The Elbow method may produce plots with concavity changing from one point to a next. It means that some points must be removed from consideration, and my method takes care of this. In addition the math is adapted for clustering procedure and much simpler. This helps with efficiency and maintenance. Best regards, Mya On Thu, Oct 22, 2020 at 9:04 AM wrote: > Send scikit-learn mailing list submissions to > scikit-learn at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-request at python.org > > You can reach the person managing the list at > scikit-learn-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Numeric version of Elbow method for finding an optimal > cluster number (Maiia Bakhova) > 2. Re: Numeric version of Elbow method for finding an optimal > cluster number (Julio Antonio Soto) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 21 Oct 2020 10:48:19 -0700 > From: Maiia Bakhova > To: scikit-learn at python.org > Subject: [scikit-learn] Numeric version of Elbow method for finding an > optimal cluster number > Message-ID: > < > CAC7JaAoY0Ub8zKUa0A_w0aUJCV5vG4M4FBtTNH1N-EXP-4se0w at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, > Sorry, this is not about scikit-learn. It is my method which I once offered > here, and was told that > it is too easy to code. Since then I have been talking to other people and > learn that linguists > do not understand enough math to work it out, and even people who are good > at math would prefer > a ready script. I wrote one and would like to share. > I did a numeric implementation of the Elbow method for calculating the > optimal cluster number. I will appecuat it greatly if you show my work here > and people would give me feedback on it. > https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means > Best, > Mya > -- > Maiia Bakhova > Mathematician in Data Science > http://myabakhova.blogspot.com > https://www.linkedin.com/in/myabakhova > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/scikit-learn/attachments/20201021/98b526fd/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Wed, 21 Oct 2020 19:53:08 +0200 > From: Julio Antonio Soto > To: Scikit-learn mailing list > Subject: Re: [scikit-learn] Numeric version of Elbow method for > finding an optimal cluster number > Message-ID: <7A9722F4-43EB-40C2-8CFC-E7236F6ECBE3 at esbet.es> > Content-Type: text/plain; charset="utf-8" > > Hi Mya, > > To me, it looks very similar to the kneed library: > https://github.com/arvkevi/kneed > > Best regards > > > > El 21 oct 2020, a las 19:48, Maiia Bakhova > escribi?: > > > > ? > > Hello, > > Sorry, this is not about scikit-learn. It is my method which I once > offered here, and was told that > > it is too easy to code. Since then I have been talking to other people > and learn that linguists > > do not understand enough math to work it out, and even people who are > good at math would prefer > > a ready script. I wrote one and would like to share. > > I did a numeric implementation of the Elbow method for calculating the > optimal cluster number. I will appecuat it greatly if you show my work here > and people would give me feedback on it. > > https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means > > Best, > > Mya > > -- > > Maiia Bakhova > > Mathematician in Data Science > > http://myabakhova.blogspot.com > > https://www.linkedin.com/in/myabakhova > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn at python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/scikit-learn/attachments/20201021/fa35d8ac/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 55, Issue 5 > ******************************************* > -- Maiia Bakhova Mathematician in Data Science http://myabakhova.blogspot.com https://www.linkedin.com/in/myabakhova -------------- next part -------------- An HTML attachment was scrubbed... URL: From marmochiaskl at gmail.com Fri Oct 23 05:00:11 2020 From: marmochiaskl at gmail.com (Chiara Marmo) Date: Fri, 23 Oct 2020 11:00:11 +0200 Subject: [scikit-learn] scikit-learn monthly meeting October 26th 2020 In-Reply-To: References: Message-ID: Dear all, this a reminder for the Monday core-dev meeting: please note that the meeting will take place at 7AM for our colleagues in New York, it will be nice not to wait the European Monday morning to fill the agenda... :) Thanks for your collaboration. Chiara On Tue, Oct 20, 2020 at 6:42 PM Chiara Marmo wrote: > Dear list, > > The next scikit-learn monthly meeting will take place on Monday October > 26th at 11AM UTC: > https://www.timeanddate.com/worldclock/meetingdetails.html?year=2020&month=10&day=26&hour=11&min=0&sec=0&p1=179&p2=240&p3=195&p4=224 > > While these meetings are mainly for core-devs to discuss the current > topics, we are also happy to welcome non-core devs and other project > maintainers. > > Feel free to join, using the following link: > > https://meet.google.com/xhq-yoga-rtf > > If you plan to attend and you would like to discuss something specific > about your contribution please add your name (or github pseudo) in the " > Contributors " > section, of the public pad: > > https://hackmd.io/47swZeY3STqn5rjtkS0mDg > > Best > > Chiara > -------------- next part -------------- An HTML attachment was scrubbed... URL: From helmrp at yahoo.com Sat Oct 31 12:51:29 2020 From: helmrp at yahoo.com (The Helmbolds) Date: Sat, 31 Oct 2020 16:51:29 +0000 (UTC) Subject: [scikit-learn] Question RE: skLearn Logistic Regression References: <1071563725.605939.1604163089727.ref@mail.yahoo.com> Message-ID: <1071563725.605939.1604163089727@mail.yahoo.com> I have a case with binary results and 1-D features, like: ? ? X = np.array(-3,-2,-1,0,1,2,3,) and ? ? y = np.array(0, 0, 0, 1, 1, 1, 1) only longer arrays (about 180 entries in each array) of this general type.? So this should be the "simplest" case. Altho I've tried several variations of the Logistic input formats, in ?? LogisticRegression.fit(X, y) they keep being rejected with the most common error message being ?? Missing argument y I assure you I do indeed have an array "y" that is passed to "fit" So----What do I have to do to get Logistic Regression to accept 1-D features? -------------- next part -------------- An HTML attachment was scrubbed... URL: From seralouk at hotmail.com Sat Oct 31 13:21:08 2020 From: seralouk at hotmail.com (serafim loukas) Date: Sat, 31 Oct 2020 17:21:08 +0000 Subject: [scikit-learn] Question RE: skLearn Logistic Regression In-Reply-To: <1071563725.605939.1604163089727@mail.yahoo.com> References: <1071563725.605939.1604163089727.ref@mail.yahoo.com> <1071563725.605939.1604163089727@mail.yahoo.com> Message-ID: <0E057EC7-9B81-470D-8FB4-E723362D2728@hotmail.com> These are not numpy arrays. Try: X = np.array([-3,-2,-1,0,1,2,3]).reshape(-1,1) And y = np.array([0, 0, 0, 1, 1, 1, 1]).reshape(-1,1) Makis On 31 Oct 2020, at 17:51, The Helmbolds via scikit-learn > wrote: I have a case with binary results and 1-D features, like: X = np.array(-3,-2,-1,0,1,2,3,) and y = np.array(0, 0, 0, 1, 1, 1, 1) only longer arrays (about 180 entries in each array) of this general type. So this should be the "simplest" case. Altho I've tried several variations of the Logistic input formats, in LogisticRegression.fit(X, y) they keep being rejected with the most common error message being Missing argument y I assure you I do indeed have an array "y" that is passed to "fit" So----What do I have to do to get Logistic Regression to accept 1-D features? _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: