urgent help in scikit-learn
Hi everyone, I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts: 1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data? Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657 With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g., import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data. Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. : coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians ...and: db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords) Cheers, Shane On 03/30, Sebastian Raschka wrote:
Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score. metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred? With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote:
Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html# adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi, Shuchi, regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset. The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g, dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X) Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
How can I get ground truth labels of the training examples in my dataset? With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering. html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka <se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Raschka, I want to know how to use cross validation when other regression model such as poisson is used in place of linear? Kindly help. With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka <se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering. html#adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Raschka, I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)? With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur On Tue, Apr 4, 2017 at 9:15 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I want to know how to use cross validation when other regression model such as poisson is used in place of linear?
Kindly help.
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka < se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html# adjusted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Raschka, I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)? With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur On Tue, Apr 4, 2017 at 2:05 PM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Tue, Apr 4, 2017 at 9:15 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I want to know how to use cross validation when other regression model such as poisson is used in place of linear?
Kindly help.
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka < se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjus ted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Shuchi, You probably want to query the Statsmodels community for this; they have a google groups board here: https://groups.google.com/forum/#!forum/pystatsmodels Cheers, Shane On 04/05, Shuchi Mala wrote:
Hi Raschka,
I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Tue, Apr 4, 2017 at 2:05 PM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Tue, Apr 4, 2017 at 9:15 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I want to know how to use cross validation when other regression model such as poisson is used in place of linear?
Kindly help.
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka@gmail.com> wrote:
Don’t get me wrong, but you’d have to either manually label them yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka < se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score.
metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. :
coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians
...and:
db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') # replace eps and min_samples as appropriate db.fit(coords)
Cheers, Shane
On 03/30, Sebastian Raschka wrote: Hi, Shuchi,
1. How can I add data to the data set of the package?
You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g.,
import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X = df.values
2. How I can calculate Rand index for my data?
After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see http://scikit-learn.org/stable/modules/clustering.html#adjus ted-rand-score
3. How to use make_blobs command for my data?
The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data.
Best, Sebastian
On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi everyone,
I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts:
1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command for my data?
Sample of my data is : Latitude Longitude 37.76901 -122.429299 37.76904 -122.42913 37.76878 -122.429092 37.7763 -122.424249 37.77627 -122.424657
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder*
Also, in general it's not appropriate to repeatedly ping someone on this mailing list for 'urgent help.' On Wed, Apr 5, 2017 at 8:30 AM, Shane Grigsby <shane.grigsby@colorado.edu> wrote:
Hi Shuchi, You probably want to query the Statsmodels community for this; they have a google groups board here:
https://groups.google.com/forum/#!forum/pystatsmodels
Cheers, Shane
On 04/05, Shuchi Mala wrote:
Hi Raschka,
I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Tue, Apr 4, 2017 at 2:05 PM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I need an urgent help. how I can use Statsmodels Poisson function function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Tue, Apr 4, 2017 at 9:15 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
Hi Raschka,
I want to know how to use cross validation when other regression model such as poisson is used in place of linear?
Kindly help.
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka@gmail.com
wrote:
Don’t get me wrong, but you’d have to either manually label them
yourself, asking domain experts, or use platforms like Amazon Turk (or collect them in some other way).
On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23@gmail.com> wrote:
How can I get ground truth labels of the training examples in my dataset?
With Best Regards, Shuchi Mala Research Scholar Department of Civil Engineering MNIT Jaipur
On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka < se.raschka@gmail.com> wrote: Hi, Shuchi,
regarding labels_true: you’d only be able to compute the rand index adjusted for chance if you have the ground truth labels iof the training examples in your dataset.
The second parameter, labels_pred, takes in the predicted cluster labels (indices) that you got from the clustering. E.g,
dbscn = DBSCAN() labels_pred = dbscn.fit(X).predict(X)
Best, Sebastian
> On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23@gmail.com> wrote: > > Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score. > > metrics.adjusted_rand_score(labels_true, labels_pred) > In my case what will be labels_true and labels_pred and how I will calculate labels_pred? > > With Best Regards, > Shuchi Mala > Research Scholar > Department of Civil Engineering > MNIT Jaipur > > > On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby < shane.grigsby@colorado.edu> wrote: > Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. : > > coords = np.vstack([lats.ravel(),longs.ravel()]).T > coords *= np.pi / 180. # to radians > > ...and: > > db = DBSCAN(eps=0.3, min_samples=10, metric='haversine') > # replace eps and min_samples as appropriate > db.fit(coords) > > Cheers, > Shane > > > On 03/30, Sebastian Raschka wrote: > Hi, Shuchi, > > 1. How can I add data to the data set of the package? > > You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g., > > import pandas as pd > df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) > X = df.values > > 2. How I can calculate Rand index for my data? > > After you ran the clustering, you can use the “adjusted_rand_score” function, e.g., see > http://scikit-learn.org/stable/modules/clustering.html#adjus ted-rand-score > > 3. How to use make_blobs command for my data? > > The make_blobs command is just a utility function to create toydatasets, you wouldn’t need it in your case since you already have “real” data. > > Best, > Sebastian > > > On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23@gmail.com> wrote: > > Hi everyone, > > I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts: > > 1. How can I add data to the data set of the package? > 2. How I can calculate Rand index for my data? > 3. How to use make_blobs command for my data? > > Sample of my data is : > Latitude Longitude > 37.76901 -122.429299 > 37.76904 -122.42913 > 37.76878 -122.429092 > 37.7763 -122.424249 > 37.77627 -122.424657 > > > With Best Regards, > Shuchi Mala > Research Scholar > Department of Civil Engineering > MNIT Jaipur > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- > *PhD candidate & Research Assistant* > *Cooperative Institute for Research in Environmental Sciences (CIRES)* > *University of Colorado at Boulder* > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- *PhD candidate & Research Assistant* *Cooperative Institute for Research in Environmental Sciences (CIRES)* *University of Colorado at Boulder* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (4)
-
Jacob Schreiber -
Sebastian Raschka -
Shane Grigsby -
Shuchi Mala