[scikit-learn] help-Renaming features in Sckit-learn's CountVectorizer()

Ranjana Girish ranjanagirish30 at gmail.com
Mon Mar 5 09:18:26 EST 2018


Hai all,

I have a very large pandas dataframe. Below is the sample

   * Id      description*
    1        switvch for air conditioner transformer..............
    2        control tfrmr...........
    3        coling pad.................
    4        DRLG machine
    5        hair smothing kit...............

For further process, I will contruct doument-term matrix of above data
using Sckit-learn's countvectorizer

*countvec = CountVectorizer()*
*documenttermmatrix=countvec.fit_transform(  dataset['description'])*

I have to correct misspelled features in description. Replacing wrongly
spelled word with correctly spelled word  for large dataset is taking so
much of time.

So i thought of  correcting features using features list in count
vectorizer given by code

*features_names= **countvec.get_feature_names()*

*Is it possible to rename features using above list and further use it for
classification process???*

Thanks
Ranjana
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180305/7ca1f2c0/attachment.html>


More information about the scikit-learn mailing list