[scikit-learn] Fwd: Loading file in libsvm format

Thu Sep 8 14:43:53 EDT 2016

The feature are sparse.
The third row says 5765159:1 meaning there is a feature with number 5765159.

On 09/08/2016 02:40 PM, klo uo wrote:
>
> ---------- Forwarded message ----------
> From: *klo uo* <klonuo at gmail.com <mailto:klonuo at gmail.com>>
> Date: Thu, Sep 8, 2016 at 8:25 PM
> Subject: Loading file in libsvm format
> To: scikit-learn-general at lists.sourceforge.net 
> <mailto:scikit-learn-general at lists.sourceforge.net>
>
>
> Hi,
>
> I produced a file in libsvm format:
>
> <label> <index1>:<value1> <index2>:<value2> ...
>
> with this content:
>
> 6284 576:1 884:1 2482:1 4279:1 5765:1 184552:1 661512:1 699842:1
>     2259 1669:1 5711528:6
>     2822 5765159:1
>     ...
>
> The label is document_id, and index:value are term_id and term count.
>
> This file has 83K labels with 40K unique terms (and overall 1.2M 
> index:value pairs).
>
> When I load this file in sklearn:
>
> from sklearn.datasets import load_svmlight_file
>     X, y = load_svmlight_file('libsim.txt')
>
> I get X with shape (82448, 6092168).
>
> I don't know of any reason why am I getting 6M features?
> Can someone explain?
>
>
> Thanks
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160908/18fd4f17/attachment.html>