[scikit-learn] urgent help in scikit-learn
Shane Grigsby
shane.grigsby at colorado.edu
Wed Apr 5 11:30:30 EDT 2017
Hi Shuchi,
You probably want to query the Statsmodels community for this; they have
a google groups board here:
https://groups.google.com/forum/#!forum/pystatsmodels
Cheers,
Shane
On 04/05, Shuchi Mala wrote:
>Hi Raschka,
>
>I need an urgent help. how I can use Statsmodels Poisson function
>function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross
>validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
>
>With Best Regards,
>Shuchi Mala
>Research Scholar
>Department of Civil Engineering
>MNIT Jaipur
>
>
>On Tue, Apr 4, 2017 at 2:05 PM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
>
>> Hi Raschka,
>>
>> I need an urgent help. how I can use Statsmodels Poisson function
>> function (statsmodels.genmod.families.Poisson) with Sci-Kit Learn's cross
>> validation metrics (cross_val_score, ShuffleSplit, cross_val_predict)?
>>
>> With Best Regards,
>> Shuchi Mala
>> Research Scholar
>> Department of Civil Engineering
>> MNIT Jaipur
>>
>>
>> On Tue, Apr 4, 2017 at 9:15 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
>>
>>> Hi Raschka,
>>>
>>> I want to know how to use cross validation when other regression model
>>> such as poisson is used in place of linear?
>>>
>>> Kindly help.
>>>
>>> With Best Regards,
>>> Shuchi Mala
>>> Research Scholar
>>> Department of Civil Engineering
>>> MNIT Jaipur
>>>
>>>
>>> On Mon, Apr 3, 2017 at 8:05 PM, Sebastian Raschka <se.raschka at gmail.com>
>>> wrote:
>>>
>>>> Don’t get me wrong, but you’d have to either manually label them
>>>> yourself, asking domain experts, or use platforms like Amazon Turk (or
>>>> collect them in some other way).
>>>>
>>>> > On Apr 3, 2017, at 7:38 AM, Shuchi Mala <shuchi.23 at gmail.com> wrote:
>>>> >
>>>> > How can I get ground truth labels of the training examples in my
>>>> dataset?
>>>> >
>>>> > With Best Regards,
>>>> > Shuchi Mala
>>>> > Research Scholar
>>>> > Department of Civil Engineering
>>>> > MNIT Jaipur
>>>> >
>>>> >
>>>> > On Fri, Mar 31, 2017 at 8:17 PM, Sebastian Raschka <
>>>> se.raschka at gmail.com> wrote:
>>>> > Hi, Shuchi,
>>>> >
>>>> > regarding labels_true: you’d only be able to compute the rand index
>>>> adjusted for chance if you have the ground truth labels iof the training
>>>> examples in your dataset.
>>>> >
>>>> > The second parameter, labels_pred, takes in the predicted cluster
>>>> labels (indices) that you got from the clustering. E.g,
>>>> >
>>>> > dbscn = DBSCAN()
>>>> > labels_pred = dbscn.fit(X).predict(X)
>>>> >
>>>> > Best,
>>>> > Sebastian
>>>> >
>>>> >
>>>> > > On Mar 31, 2017, at 12:02 AM, Shuchi Mala <shuchi.23 at gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Thank you so much for your quick reply. I have one more doubt. The
>>>> below statement is used to calculate rand score.
>>>> > >
>>>> > > metrics.adjusted_rand_score(labels_true, labels_pred)
>>>> > > In my case what will be labels_true and labels_pred and how I will
>>>> calculate labels_pred?
>>>> > >
>>>> > > With Best Regards,
>>>> > > Shuchi Mala
>>>> > > Research Scholar
>>>> > > Department of Civil Engineering
>>>> > > MNIT Jaipur
>>>> > >
>>>> > >
>>>> > > On Thu, Mar 30, 2017 at 8:38 PM, Shane Grigsby <
>>>> shane.grigsby at colorado.edu> wrote:
>>>> > > Since you're using lat / long coords, you'll also want to convert
>>>> them to radians and specify 'haversine' as your distance metric; i.e. :
>>>> > >
>>>> > > coords = np.vstack([lats.ravel(),longs.ravel()]).T
>>>> > > coords *= np.pi / 180. # to radians
>>>> > >
>>>> > > ...and:
>>>> > >
>>>> > > db = DBSCAN(eps=0.3, min_samples=10, metric='haversine')
>>>> > > # replace eps and min_samples as appropriate
>>>> > > db.fit(coords)
>>>> > >
>>>> > > Cheers,
>>>> > > Shane
>>>> > >
>>>> > >
>>>> > > On 03/30, Sebastian Raschka wrote:
>>>> > > Hi, Shuchi,
>>>> > >
>>>> > > 1. How can I add data to the data set of the package?
>>>> > >
>>>> > > You don’t need to add your dataset to the dataset module to run your
>>>> analysis. A convenient way to load it into a numpy array would be via
>>>> pandas. E.g.,
>>>> > >
>>>> > > import pandas as pd
>>>> > > df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”)
>>>> > > X = df.values
>>>> > >
>>>> > > 2. How I can calculate Rand index for my data?
>>>> > >
>>>> > > After you ran the clustering, you can use the “adjusted_rand_score”
>>>> function, e.g., see
>>>> > > http://scikit-learn.org/stable/modules/clustering.html#adjus
>>>> ted-rand-score
>>>> > >
>>>> > > 3. How to use make_blobs command for my data?
>>>> > >
>>>> > > The make_blobs command is just a utility function to create
>>>> toydatasets, you wouldn’t need it in your case since you already have
>>>> “real” data.
>>>> > >
>>>> > > Best,
>>>> > > Sebastian
>>>> > >
>>>> > >
>>>> > > On Mar 30, 2017, at 4:51 AM, Shuchi Mala <shuchi.23 at gmail.com>
>>>> wrote:
>>>> > >
>>>> > > Hi everyone,
>>>> > >
>>>> > > I have the data with following attributes: (Latitude, Longitude).
>>>> Now I am performing clustering using DBSCAN for my data. I have following
>>>> doubts:
>>>> > >
>>>> > > 1. How can I add data to the data set of the package?
>>>> > > 2. How I can calculate Rand index for my data?
>>>> > > 3. How to use make_blobs command for my data?
>>>> > >
>>>> > > Sample of my data is :
>>>> > > Latitude Longitude
>>>> > > 37.76901 -122.429299
>>>> > > 37.76904 -122.42913
>>>> > > 37.76878 -122.429092
>>>> > > 37.7763 -122.424249
>>>> > > 37.77627 -122.424657
>>>> > >
>>>> > >
>>>> > > With Best Regards,
>>>> > > Shuchi Mala
>>>> > > Research Scholar
>>>> > > Department of Civil Engineering
>>>> > > MNIT Jaipur
>>>> > >
>>>> > > _______________________________________________
>>>> > > scikit-learn mailing list
>>>> > > scikit-learn at python.org
>>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > >
>>>> > > _______________________________________________
>>>> > > scikit-learn mailing list
>>>> > > scikit-learn at python.org
>>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > >
>>>> > > --
>>>> > > *PhD candidate & Research Assistant*
>>>> > > *Cooperative Institute for Research in Environmental Sciences
>>>> (CIRES)*
>>>> > > *University of Colorado at Boulder*
>>>> > >
>>>> > > _______________________________________________
>>>> > > scikit-learn mailing list
>>>> > > scikit-learn at python.org
>>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> > >
>>>> > > _______________________________________________
>>>> > > scikit-learn mailing list
>>>> > > scikit-learn at python.org
>>>> > > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> >
>>>> > _______________________________________________
>>>> > scikit-learn mailing list
>>>> > scikit-learn at python.org
>>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>> >
>>>> > _______________________________________________
>>>> > scikit-learn mailing list
>>>> > scikit-learn at python.org
>>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>
>>>
>>
>_______________________________________________
>scikit-learn mailing list
>scikit-learn at python.org
>https://mail.python.org/mailman/listinfo/scikit-learn
--
*PhD candidate & Research Assistant*
*Cooperative Institute for Research in Environmental Sciences (CIRES)*
*University of Colorado at Boulder*
More information about the scikit-learn
mailing list