Any recommend way to encode IP address?
I collect data which has many access log from different IP. But I don't know what's the better way to encode it to make sure small size of train data and keep the independency of different IPs. 1. one-hot encode: If too many IP, the train data will occupy huge disk spaces. 2. category encode: IP will be encoded to 0~N, but can't show the relation between different IPs. anyone have advices?
Hey, Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want. Best, On Fri, Aug 16, 2019 at 10:46 AM lampahome <pahome.chen@mirlab.org> wrote:
I collect data which has many access log from different IP.
But I don't know what's the better way to encode it to make sure small size of train data and keep the independency of different IPs.
1. one-hot encode: If too many IP, the train data will occupy huge disk spaces. 2. category encode: IP will be encoded to 0~N, but can't show the relation between different IPs.
anyone have advices? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi guys, How can I unsubscribe myself from Scikit-learn mailing list? Thanks. On Fri, 16 Aug 2019 at 4:56 PM Chris Aridas <chris@aridas.eu> wrote:
Hey,
Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want.
Best,
On Fri, Aug 16, 2019 at 10:46 AM lampahome <pahome.chen@mirlab.org> wrote:
I collect data which has many access log from different IP.
But I don't know what's the better way to encode it to make sure small size of train data and keep the independency of different IPs.
1. one-hot encode: If too many IP, the train data will occupy huge disk spaces. 2. category encode: IP will be encoded to 0~N, but can't show the relation between different IPs.
anyone have advices? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
https://mail.python.org/mailman/listinfo/scikit-learn On Fri, Aug 16, 2019 at 11:14 AM Santosh Subedi <santoshmsubedi@gmail.com> wrote:
Hi guys,
How can I unsubscribe myself from Scikit-learn mailing list?
Thanks.
On Fri, 16 Aug 2019 at 4:56 PM Chris Aridas <chris@aridas.eu> wrote:
Hey,
Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want.
Best,
On Fri, Aug 16, 2019 at 10:46 AM lampahome <pahome.chen@mirlab.org> wrote:
I collect data which has many access log from different IP.
But I don't know what's the better way to encode it to make sure small size of train data and keep the independency of different IPs.
1. one-hot encode: If too many IP, the train data will occupy huge disk spaces. 2. category encode: IP will be encoded to 0~N, but can't show the relation between different IPs.
anyone have advices? _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Chris Aridas <chris@aridas.eu> 於 2019年8月16日 週五 下午3:56寫道:
Hey,
Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want.
It seems to register to get API to use. Is it all free?
It was just an idea about how you can extract features from IP addresses, not a direction to use that service. Best, Chris On Fri, Aug 16, 2019 at 11:55 AM lampahome <pahome.chen@mirlab.org> wrote:
Chris Aridas <chris@aridas.eu> 於 2019年8月16日 週五 下午3:56寫道:
Hey,
Apart from encoding you could use feature engineering. Something like this https://ipgeolocation.io/documentation/ip-geolocation-api.html Two IPs might have the same country but different city. So, you could mix and match whatever you want.
It seems to register to get API to use.
Is it all free?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Chris Aridas <chris@aridas.eu> 於 2019年8月16日 週五 下午5:26寫道:
It was just an idea about how you can extract features from IP addresses, not a direction to use that service.
If I just encode the ip address, is there any efficient way? What I found reliable is arithmetic encoding and convert ip string to integer directly.
participants (3)
-
Chris Aridas -
lampahome -
Santosh Subedi