[scikit-learn] urgent help in scikit-learn
Sebastian Raschka
se.raschka at gmail.com
Mon Apr 3 10:35:08 EDT 2017
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
> On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
>
> How can I get ground truth labels of the training examples in my dataset?
>
> With Best Regards,
> Shuchi Mala
> Research Scholar
> Department of Civil Engineering
> MNIT Jaipur
>
>
> On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka <se.raschka at gmail.com> wrote:
> Hi, Shuchi,
>
> regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
>
> The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
>
> dbscn = DBSCAN()
> labels_pred = dbscn.fit(X).predict(X)
>
> Best,
> Sebastian
>
>
> > On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
> >
> > Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
> >
> > metrics.adjusted_rand_score(labels_true, labels_pred)
> > In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
> >
> > With Best Regards,
> > Shuchi Mala
> > Research Scholar
> > Department of Civil Engineering
> > MNIT Jaipur
> >
> >
> > On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <shane.grigsby at colorado.edu> wrote:
> > Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
> >
> > coords = np.vstack([lats.ravel(),longs.ravel()]).T
> > coords *= np.pi / 180. # to radians
> >
> > ...and:
> >
> > db = DBSCAN(eps=0.3, min_samples=10, metric='haversine')
> > # replace eps and min_samples as appropriate
> > db.fit(coords)
> >
> > Cheers,
> > Shane
> >
> >
> > On 03/30, Sebastian Raschka wrote:
> > Hi, Shuchi,
> >
> > 1. How can I add data to the data set of the package?
> >
> > You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
> >
> > import pandas as pd
> > df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”)
> > X = df.values
> >
> > 2. How I can calculate Rand index for my data?
> >
> > After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see
> > http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
> >
> > 3. How to use make_blobs command for my data?
> >
> > The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
> >
> > Best,
> > Sebastian
> >
> >
> > On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
> >
> > 1. How can I add data to the data set of the package?
> > 2. How I can calculate Rand index for my data?
> > 3. How to use make_blobs command for my data?
> >
> > Sample of my data is :
> > Latitude Longitude
> > 37.76901 -122.429299
> > 37.76904 -122.42913
> > 37.76878 -122.429092
> > 37.7763 -122.424249
> > 37.77627 -122.424657
> >
> >
> > With Best Regards,
> > Shuchi Mala
> > Research Scholar
> > Department of Civil Engineering
> > MNIT Jaipur
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > --
> > *PhD candidate & Research Assistant*
> > *Cooperative Institute for Research in Environmental Sciences (CIRES)*
> > *University of Colorado at Boulder*
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list