[scikit-learn] Any recommend way to encode IP address?

lampahome pahome.chen at mirlab.org
Fri Aug 16 03:45:42 EDT 2019


I collect data which has many access log from different IP.

But I don't know what's the better way to encode it to make sure small size
of train data and keep the independency of different IPs.

1. one-hot encode: If too many IP, the train data will occupy huge disk
spaces.
2. category encode: IP will be encoded to 0~N, but can't show the relation
between different IPs.

anyone have advices?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190816/2b35bb8b/attachment.html>


More information about the scikit-learn mailing list