Tracking Users By IP Address

Michael Foord fuzzyman at gmail.com
Fri Oct 8 16:23:39 CEST 2004


[thanks but snip.. ;-) ]
> One alternative: (pseudocode)
> 
> Recieve request
> If no-cookie-received:
>    Set Cookie: "NEWUSER"
> else:
>    if cookie-recieved == "NEWUSER":
>       # We know they can send us cookies back
>       id = gen-id()
>       Set Cookie: id
> 

Yep.. this I understand and will try.
Thanks

> Then just log requests with the recieved cookie, trackable users will have
> a unique id, whether their IP changes, share a system, behind nat'ing
> firewalls etc. This allows you to track unique users that are trackable
> using cookies. If you have a particularly large number of users accessing
> your site you can tie in sampling (perhaps something like density biased
> sampling) in there as well something like this:
> 
> new-cookie = None
> If no-cookie-received:
>    new-cookie = "NEWUSER"
> else:
>    if cookie-recieved == "NEWUSER":
>       # We know they can send us cookies back
>       id = gen-id()
>       new-cookie = id
> 
> if add-to-sample-set(request):
>    tag = "SAMPLE"
>    new-cookie = current-cookie or new-cookie
> else:
>    tag = "NOSAMPLE"
> 
> if new-cookie:
>    Set Cookie: tag new-cookie
> 


Sorry... :-( don't get it.
What is add-to-sample-set(request) doing ? Is it simply choosing a
proportion of our users to sample ?

If this is only a 'do if you have too many users' kind-of-thing then
unfortunately it won't be a problem for me !!

> (Or something like that IYSWIM - ie get the user population to indicate if
> they're being sampled - again, this allows your users to easily opt out,

As above... I don't get it, so I don't see how it achieves this ?

> and also means the memory/etc required to determine whether to track the
> user or not isn't dependent on the number of requests your site gets -
> meaning that you can keep analysis costs for your site under control. If
> you've only got a small site this probably doesn't matter to you, but 
> worth bearing in mind).
> 
> The interesting thing about this from my perspective is that if you do 
> take a cookie approach like this, it actually allows you to figure out how
> much error there actually is between IP and cookie - rather than just guess.

One last question. You didn't explicitly say this, but I was thinking
of doing it anyway. Are you suggesting to store USERID *and* IP
address and compare the results of anylysing by IP and analysing by
cookie.... Sounds worthwhile...

Thanks for your help - very interesting.

Regards,

Fuzzy

> The other nicety is it allows your users to opt-out very easily - since they
> can either switch off cookies, or you can send them a "NOSAMPLE" cookie.
> 
> Also, at present comments in this thread revolve around "this isn't
> reliable because of x,y and z". If you take this sort of approach you
> can find out the margin of error and then decide whether you're happy
> with it or not. Also as you can see from above this doesn't really have
> to be a very complex operation (unless you're in a high volume scenario
> with lots of distinct users and need to add in the sampling aspect).
> 
> Best Regards,
> 
> 
> Michael



More information about the Python-list mailing list