[Python-ideas] collections.Counter should implement mul, rmul

Mon Apr 16 01:24:54 EDT 2018

On Monday, April 16, 2018, Raymond Hettinger <raymond.hettinger at gmail.com>
wrote:

>
>
> > On Apr 15, 2018, at 9:04 PM, Peter Norvig <peter at norvig.com> wrote:
> >
> > it would be a bit weird and disorienting for the arithmetic operators to
> have two different signatures:
> >
> >     <counter> += <counter>
> >     <counter> -= <counter>
> >     <counter> *= <scalar>
> >     <counter> /= <scalar>
> >
> > Is it weird and disorienting to have:
> >
> > <str> += <str>
> > <str> *= <scalar>
>
> Yes, there is a precedent that does seem to have worked out well in
> practice :-)  It isn't exactly parallel because strings aren't containers
> of numbers, they don't have & and |, and there isn't a reason to want a /
> operation, but it does suggest that signature variation might not be
> problematic.
>
> BTW, do you just want __mul__ and __rmul__?  If those went in, presumably
> there will be a request to support __imul__ because otherwise c*=3 would
> still work but would be inefficient (that was the rationale for adding
> inplace variants for all the current arithmetic operators). Likewise,
> presumably someone would legitimately want __div__ to support the
> normalization use case.  Perhaps less likely, there would be also be a
> request for __floordiv__ to allow exactly scaled results to stay in the
> domain of integers.  Which if any of these makes sense to you?
>
> Also, any thoughts on the cleanest way to express the computation of a
> chi-squared statistic (for example, to compare observed first digit
> frequencies to the frequencies predicted by Benford's Law)?  This isn't an
> arbitrary question (it came up when a professor first proposed a variant of
> this idea a few years ago).

https://en.wikipedia.org/wiki/Chi-squared_distribution
https://en.wikipedia.org/wiki/Chi-squared_test
https://en.wikipedia.org/wiki/Benford%27s_law
(How might one test this with e.g. *double* SHA256?)

proportions_chisquare(count, nobs, value=None)
https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html

https://www.statsmodels.org/dev/genindex.html?highlight=chi

scipy.stats.chisquare(f_obs, f_exp=None, ddof=0, axis=0)
https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.stats.chisquare.html

sklearn.feature_selection.chi2(X, y)
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html#sklearn.feature_selection.chi2

kernel_approximation.AdditiveChi2Sampler
kernel_approximation.SkewedChi2Sampler
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.kernel_approximation
has

sklearn.metrics.pairwise.chi2_kernel(X, Y=None, gamma=1.0)
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.chi2_kernel.html#sklearn.metrics.pairwise.chi2_kernel

sklearn.metrics.pairwise.additive_chi2_kernel(X, Y=None)
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.additive_chi2_kernel.html#sklearn.metrics.pairwise.additive_chi2_kernel

...

FreqDist(collections.Counter(odict)) ... sparse-coding ... One-Hot /
Binarization
http://contrib.scikit-learn.org/categorical-encoding/

StandardScalar (for standardization) refuses to work with sparse matrices:
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

>
> Raymond
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180416/441b64e0/attachment-0001.html>

[Python-ideas] collections.Counter should implement __mul__, __rmul__

[Python-ideas] collections.Counter should implement mul, rmul