[scikit-learn] Fwd: Loading file in libsvm format
klo uo
klonuo at gmail.com
Thu Sep 8 14:40:26 EDT 2016
---------- Forwarded message ----------
From: klo uo <klonuo at gmail.com>
Date: Thu, Sep 8, 2016 at 8:25 PM
Subject: Loading file in libsvm format
To: scikit-learn-general at lists.sourceforge.net
Hi,
I produced a file in libsvm format:
<label> <index1>:<value1> <index2>:<value2> ...
with this content:
6284 576:1 884:1 2482:1 4279:1 5765:1 184552:1 661512:1 699842:1
2259 1669:1 5711528:6
2822 5765159:1
...
The label is document_id, and index:value are term_id and term count.
This file has 83K labels with 40K unique terms (and overall 1.2M
index:value pairs).
When I load this file in sklearn:
from sklearn.datasets import load_svmlight_file
X, y = load_svmlight_file('libsim.txt')
I get X with shape (82448, 6092168).
I don't know of any reason why am I getting 6M features?
Can someone explain?
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160908/14fbb1c9/attachment.html>
More information about the scikit-learn
mailing list