<div dir="ltr">Negative values are not really there to compensate for hash collisions. It's there because that makes the hashed vector space an approximation to the full vector space under inner product.</div><div class="gmail_extra"><br><div class="gmail_quote">On 2 October 2016 at 00:17, Roman Yurchak <span dir="ltr"><<a href="mailto:rth.yurchak@gmail.com" target="_blank">rth.yurchak@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 01/10/16 15:34, Moyi Dang wrote:<br>

> However, I don't understand why the negatives are there in the first<br>

> place, or what they mean. I'm not sure if the absolute values are<br>

> corresponding to the token counts.<br>

><br>

> Can someone please help explain what the HashingVectorizer is doing? How<br>

> do I get the HashingVectorizer to return token counts?<br>

<br>

</span>Hi Moyi,<br>

<br>

it's a mechanism to compensate for hash collisions, see<br>

<a href="https://github.com/scikit-learn/scikit-learn/issues/7513" rel="noreferrer" target="_blank">https://github.com/scikit-<wbr>learn/scikit-learn/issues/7513</a> The absolute<br>

values are token counts for most practical applications (if you don't<br>

have too many collisions).  There will be a PR shortly to make this more<br>

consistent.<br>

<br>

<br>

______________________________<wbr>_________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scikit-learn</a><br>

</blockquote></div><br></div>