[scikit-learn] urgent help in scikit-learn

Sebastian Raschka se.raschka at gmail.com
Fri Mar 31 10:47:55 EDT 2017


Hi, Shuchi,

regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset. 

The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g, 

dbscn = DBSCAN()
labels_pred = dbscn.fit(X).predict(X)

Best,
Sebastian


> On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
> 
> Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
>  
> metrics.adjusted_rand_score(labels_true, labels_pred) 
>  In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
> 
> With Best Regards,
> Shuchi  Mala
> Research Scholar
> Department of Civil Engineering
> MNIT Jaipur
> 
> 
> On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <shane.grigsby at colorado.edu> wrote:
> Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
> 
>    coords = np.vstack([lats.ravel(),longs.ravel()]).T
>    coords *= np.pi / 180. # to radians
> 
> ...and:
> 
>    db = DBSCAN(eps=0.3, min_samples=10, metric='haversine')
>    # replace eps and min_samples as appropriate
>    db.fit(coords)
> 
> Cheers,
> Shane
> 
> 
> On 03/30, Sebastian Raschka wrote:
> Hi, Shuchi,
> 
> 1. How can I add data to the data set of the package?
> 
> You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
> 
> import pandas as pd
> df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”)
> X = df.values
> 
> 2. How I can calculate Rand index for my data?
> 
> After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see
> http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
> 
> 3. How to use make_blobs command for my data?
> 
> The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
> 
> Best,
> Sebastian
> 
> 
> On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
> 
> Hi everyone,
> 
> I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
> 
> 1. How can I add data to the data set of the package?
> 2. How I can calculate Rand index for my data?
> 3. How to use make_blobs command for my data?
> 
> Sample of my data is :
> Latitude        Longitude
> 37.76901        -122.429299
> 37.76904        -122.42913
> 37.76878        -122.429092
> 37.7763 -122.424249
> 37.77627        -122.424657
> 
> 
> With Best Regards,
> Shuchi  Mala
> Research Scholar
> Department of Civil Engineering
> MNIT Jaipur
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> -- 
> *PhD candidate & Research Assistant*
> *Cooperative Institute for Research in Environmental Sciences (CIRES)*
> *University of Colorado at Boulder*
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list