Fwd: Optional kwarg for sequence methods in random module

Hi! Recently I encountered a situation when I needed to pick a random population sample but with specific distribution function - paretovariate in this particular case. After poking around in the random module I found out that none of the so-called "sequence methods" methods support custom random function except for random.shuffle <https://github.com/python/cpython/blob/master/Lib/random.py#L293>. I would like to propose a new optional keyword argument for all these sequence methods (choice, sample, choices) that will - similarly to how shuffle does - take a zero-argument random distribution function of the users choice. Because it should be a zero-argument function, it would most likely require to pass "configured" distribution function with lambda. I would see it being used like this: ```python import random random.choice([1, 2, 3], random=lambda: random.paretovariate(1.75)) ``` This seems reasonable to me and I would find it beneficial for end-user as well with the flexibility it gives. In my opinion, it would also make the sequence methods api more uniform by all of them supporting this feature (I don't know if that's a valid argument since the shuffle supported that from the beginning and was added about 20 years ago <https://github.com/python/cpython/commit/6c395ba31609eeffce2428280cc5d95e4fb...> :P). And since it's going to be optional kwarg, hence backward compatibility will be preserved. If that makes sense I'm willing to make a patch. I looked into random.py and it looks that implementation should be easy enough. One thing I'm concern about is how it would all relate to _randommodule.c (I have little knowledge about C part) Thanks! Patryk

On Aug 17, 2019, at 00:02, Patryk Gałczyński <evemorgen1911@gmail.com> wrote:
One thing I'm concern about is how it would all relate to _randommodule.c (I have little knowledge about C part)
You probably don’t need to change the C code at all. I’m pretty sure none of these functions are implemented in C, they just use functions like randbelow that in turn use functions that may be implemented in C. The only things I can imagine to worry about: * Does it require copy-pasting a lot of code to do the “faking randbelow on top of custom random” thing? If so, does refactoring that have an unacceptable performance cost? * is the performance cost of a custom random reasonably low, and roughly equivalent to shuffle? * Is there any danger of throwing away too much entropy (or not having enough bits in the first place) to wreck the distribution? I’m pretty sure you could dispel all of those worries. (For example, for the last one, if there’s enough bits to generate N values from [0,N) for shuffle, there must be enough to generate 1 value from the same range for choice, right? Unless there’s something about shuffle that automatically smooths out distortions that you don’t get from choice?)

On Sat, Aug 17, 2019 at 09:02:54AM +0200, Patryk Gałczyński wrote:
Let me see if I understand... you have a paretovariate distribution, and you want to generate a sample of values from that distribution? Or another way of putting it, you want to generate random numbers from that distribution? Use the random.paretovariate() function (which I see you know about). There's no need for random.choice or other methods to be involved. That might not be what you *want*, but that is what you seem to be describing.
The optional argument random is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function random(). but random.paretovariate returns a float between [1.0, ∞). By definition, every value it produces is out of range, and there's no way of compressing it to [0, 1) without changing the shape. I think you need to consider more strongly what you are actually trying to solve, go back to trying to understand the requirements before writing the program. Because I don't think that "Choose an item from this finite list according to an infinite distribution" is meaningful. -- Steven

Hey Steven, I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x But i think there is another issue to consider : for example if we want to sort : random.choice(['a', 'b', 'c'], random=lambda: random.paretovariate(1.75)), we should have a mapping between the elements of the list (here :a,b,c) to [0, 1] and the reverser mapping, having this mapping underline that we have a notion of order and distance, "measurement" between the elements, and it is not trivial to always have a notion of "order" or "distance" between the elements we manipulate, but it could be fun to have this possibility. What do you think ? Thank you ! Best regards, -- Hamza Le dim. 18 août 2019 à 11:56, Steven D'Aprano <steve@pearwood.info> a écrit :

One big issue with using that mapping is it doesn’t preserve many of the characteristics of the initial distribution. If the x distribution is low waited, then the y distribution is high weighted. If you want to try to preserve the shape, the fact that you are mapping a distribution that goes to infinity to a value with an upper limit says it can’t be precisely done and maintain distribution. You could shift it down by subtracting 1, possible scale it, and then truncate it if you get a value to large (either redraw or just saturate to 1). This will give you something with close to the right shape, depending on how you want to define a mapping of an infinite range to a finite range.

On Aug 18, 2019, at 06:20, Senhaji Rhazi hamza <hamza.senhajirhazi@gmail.com> wrote:
Hey Steven,
I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x
Well, that gives you a distribution from (0, 1] instead of [0. 1), which is technically not legal as a substitute for random, even though it’ll actually only matter 1 in about 2^42 runs. You could pass this to shuffle today and probably get away with it, so asking for the same in choice, etc. isn’t too outrageous. But it’s still hard to judge whether it’s a good suggestion, if we don’t actually know what you’re trying to accomplish. First, there’s no way random could even know that you needed anything transformed. The distribution functions don’t come with metadata describing their support, and, even if they did, a new lambda that you pass in wouldn’t. As far as it could possibly tell, you passed in something that claims to be a nullary function that returns values in [0, 1), and it is a nullary function, and that’s all it knows. More importantly, even if that weren’t a problem, 1/x is hardly the one and only one obvious guess at how to turn Pareto into something with the appropriate support. In fact, I suspect more people would probably want a shifted, scaled, and truncated Pareto if they asked for Pareto. (Much as people often tall about things like the mean of a Cauchy distribution, which doesn’t exist but can be approximated very well with the mean of a Cauchy distribution truncated to some very large maximum.)
I’m not sure what all of this means. What are you sorting? Why are you passing paretovariate instead of 1/ that after you just talked about that? What is this reverser mapping? Why are you switching from probability concepts to metric theory halfway through the paragraph? Maybe it would help if you could explain what you expected this to do. What should the odds of a, b, and c be in your function? If we understood what you wanted, people could probably (a) explain the best way to do it today, (b) come up with a specific proposal for a way to make it easier, and (c) evaluate whether that proposal is a good idea. Or, alternatively, if you have a different example, one where you’d obviously want to use some existing [0, 1) distribution with choice or sample and don’t need to go through all this to explain it, that would help. The thing is, your proposal—to add a random argument to choice, choices, and sample—makes sense, but if your only example doesn’t make sense, it’s very hard to judge the proposal. It sounds like a good idea on the face of it. The only obvious problems are the limitations of only having a double rather than as many bits as needed, and coming from a PRNG with a limited period compared to 2^N for large N, both of which are if anything even bigger problems for shuffle, which already allows a random argument. So, why not choice and sample too? But if the only thing anyone wants from it is something that doesn’t make sense, then maybe we’re better off without it?

Hey Steven, Thank you for having taken the time to comment qualitatively my response. For the first part : I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x I didn't pay attention of the fact stated by @Richard Damon <Richard@damon-family.org> that the shape of distribution won't be preserved not only because of the argument that the distribution will goes from (0, 1] instead of [0. 1) but the pareto shape will change, and the solution shifted, scaled, and truncated Pareto make more sens For the second part But i think there is another issue to consider : for example if we want to sort : random.choice(['a', 'b', 'c'], random=lambda: random.paretovariate(1.75)), we should have a mapping between the elements of the list (here :a,b,c) to [0, 1] and the reverser mapping, having this mapping underline that we have a notion of order and distance, "measurement" between the elements, and it is not trivial to always have a notion of "order" or "distance" between the elements we manipulate, but it could be fun to have this possibility. At To answer the following questions : - What are you sorting ? - What is this reverser mapping ? - Why are you switching from probability concepts to metric theory halfway through the paragraph? - Maybe it would help if you could explain what you expected this to do. What should the odds of a, b, and c be in your function? At the best of my understanding, we define a probabilisable space by (omega, tribute, P) and *X* the random variable as the mesurable function that goes from (omega, tribute, P) to the measurable space (E, Eta) the space omega could be anything for example it could be *['a','b','c'] *but we need a function that knows how to go from subset *s *of omega (where *s* <- tribute) to a subset *s2* of E (where *s2* <- Eta) and actually by measuring, *P('a')* we are measuring *P(X^-1(b) =a)* = *Px(b)) *where (b <- Eta) As an example for bernnouli law (flipping coin for example) Ω = { ω 1 , ω 2 } ; E = {0,1} T (tribute) = P(Ω) ; P( ω 1 ) = p, P( ω 2 ) = 1 − p où p ∈]0, 1[. X( ω 1 ) = 1, X( ω 2 ) = 0. Px(1) = P(X −1 (1)) = P( ω 1 ) = p et Px(0) = P(X −1 (0)) = P( ω 2 ) = 1 − p. So the mapping function is *X *and the reverse mapping function is *X^-1* To resume the idea, the mapping *X *allow us to go from abstract elements ('a','b','c') that we do not measure directly (P(a)) to elements we can measure(Px(1)) and by having *X *we will be able to have the odds of a, b, and c so my idea was to give this function *X* as a helper function that the user should define, that will know how to map subsets of the starting space for example subsets from *ASCII* to subsets of a measurable spaces (the easiest one, would be [0,1], the measure in that space will be the identity) This is why i have bringed some metric concepts. I'm sorry i couldn't get the last part : It sounds like a good idea on the face of it. The only obvious problems are the limitations of only having a double rather than as many bits as needed, and coming from a PRNG with a limited period compared to 2^N for large N, both of which are if anything even bigger problems for shuffle, which already allows a random argument. So, why not choice and sample too? I don't know if we are better without it, but it seems to be complex for a practical daily use ! Best regards, -- SENHAJI RHAZI Hamza Le dim. 18 août 2019 à 23:16, Andrew Barnert <abarnert@yahoo.com> a écrit :

Senhaji Rhazi hamza writes:
I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x
I don't think that's the point. I'll put money on a poor choice of value for the random parameter in the example: the OP chose the Pareto variate because that's what he mentioned earlier. But there are plenty of distributions besides uniform that can be parametrized to have support included in [0,1]. To me, it also seems a little odd that random.choices supports only discrete populations (possibly with weights) but doesn't support float populations. I think r.choices(random=lambda: random.paretovariate(1.75), k=k) would be a nice way to spell "sample with replacement from Pareto distribution of size k". But as Steven d'Aprano points out, you can also spell it [ random.paretovariate(1.75) for _ in range(k) ] so there's no real need to generalize Random.choices. On the other hand, r.sample(random=lambda: random.paretovariate(1.75), k=k) doesn't make immediate sense to me. What does "sampling without replacement" mean if you don't have an explicit population? (Note that random.sample doesn't support weights for a similar reason, although it could support counts.) So this suggests to me that the proposed ``random`` parameter to sequence methods is actually a spurious generalization, and has different interpretations for different methods. So I'm not against this proposal at this point, but I think it needs to be fleshed out more, both in terms of the principle we're trying to implement, and some more detailed examples. Steve

On Aug 17, 2019, at 00:02, Patryk Gałczyński <evemorgen1911@gmail.com> wrote:
One thing I'm concern about is how it would all relate to _randommodule.c (I have little knowledge about C part)
You probably don’t need to change the C code at all. I’m pretty sure none of these functions are implemented in C, they just use functions like randbelow that in turn use functions that may be implemented in C. The only things I can imagine to worry about: * Does it require copy-pasting a lot of code to do the “faking randbelow on top of custom random” thing? If so, does refactoring that have an unacceptable performance cost? * is the performance cost of a custom random reasonably low, and roughly equivalent to shuffle? * Is there any danger of throwing away too much entropy (or not having enough bits in the first place) to wreck the distribution? I’m pretty sure you could dispel all of those worries. (For example, for the last one, if there’s enough bits to generate N values from [0,N) for shuffle, there must be enough to generate 1 value from the same range for choice, right? Unless there’s something about shuffle that automatically smooths out distortions that you don’t get from choice?)

On Sat, Aug 17, 2019 at 09:02:54AM +0200, Patryk Gałczyński wrote:
Let me see if I understand... you have a paretovariate distribution, and you want to generate a sample of values from that distribution? Or another way of putting it, you want to generate random numbers from that distribution? Use the random.paretovariate() function (which I see you know about). There's no need for random.choice or other methods to be involved. That might not be what you *want*, but that is what you seem to be describing.
The optional argument random is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function random(). but random.paretovariate returns a float between [1.0, ∞). By definition, every value it produces is out of range, and there's no way of compressing it to [0, 1) without changing the shape. I think you need to consider more strongly what you are actually trying to solve, go back to trying to understand the requirements before writing the program. Because I don't think that "Choose an item from this finite list according to an infinite distribution" is meaningful. -- Steven

Hey Steven, I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x But i think there is another issue to consider : for example if we want to sort : random.choice(['a', 'b', 'c'], random=lambda: random.paretovariate(1.75)), we should have a mapping between the elements of the list (here :a,b,c) to [0, 1] and the reverser mapping, having this mapping underline that we have a notion of order and distance, "measurement" between the elements, and it is not trivial to always have a notion of "order" or "distance" between the elements we manipulate, but it could be fun to have this possibility. What do you think ? Thank you ! Best regards, -- Hamza Le dim. 18 août 2019 à 11:56, Steven D'Aprano <steve@pearwood.info> a écrit :

One big issue with using that mapping is it doesn’t preserve many of the characteristics of the initial distribution. If the x distribution is low waited, then the y distribution is high weighted. If you want to try to preserve the shape, the fact that you are mapping a distribution that goes to infinity to a value with an upper limit says it can’t be precisely done and maintain distribution. You could shift it down by subtracting 1, possible scale it, and then truncate it if you get a value to large (either redraw or just saturate to 1). This will give you something with close to the right shape, depending on how you want to define a mapping of an infinite range to a finite range.

On Aug 18, 2019, at 06:20, Senhaji Rhazi hamza <hamza.senhajirhazi@gmail.com> wrote:
Hey Steven,
I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x
Well, that gives you a distribution from (0, 1] instead of [0. 1), which is technically not legal as a substitute for random, even though it’ll actually only matter 1 in about 2^42 runs. You could pass this to shuffle today and probably get away with it, so asking for the same in choice, etc. isn’t too outrageous. But it’s still hard to judge whether it’s a good suggestion, if we don’t actually know what you’re trying to accomplish. First, there’s no way random could even know that you needed anything transformed. The distribution functions don’t come with metadata describing their support, and, even if they did, a new lambda that you pass in wouldn’t. As far as it could possibly tell, you passed in something that claims to be a nullary function that returns values in [0, 1), and it is a nullary function, and that’s all it knows. More importantly, even if that weren’t a problem, 1/x is hardly the one and only one obvious guess at how to turn Pareto into something with the appropriate support. In fact, I suspect more people would probably want a shifted, scaled, and truncated Pareto if they asked for Pareto. (Much as people often tall about things like the mean of a Cauchy distribution, which doesn’t exist but can be approximated very well with the mean of a Cauchy distribution truncated to some very large maximum.)
I’m not sure what all of this means. What are you sorting? Why are you passing paretovariate instead of 1/ that after you just talked about that? What is this reverser mapping? Why are you switching from probability concepts to metric theory halfway through the paragraph? Maybe it would help if you could explain what you expected this to do. What should the odds of a, b, and c be in your function? If we understood what you wanted, people could probably (a) explain the best way to do it today, (b) come up with a specific proposal for a way to make it easier, and (c) evaluate whether that proposal is a good idea. Or, alternatively, if you have a different example, one where you’d obviously want to use some existing [0, 1) distribution with choice or sample and don’t need to go through all this to explain it, that would help. The thing is, your proposal—to add a random argument to choice, choices, and sample—makes sense, but if your only example doesn’t make sense, it’s very hard to judge the proposal. It sounds like a good idea on the face of it. The only obvious problems are the limitations of only having a double rather than as many bits as needed, and coming from a PRNG with a limited period compared to 2^N for large N, both of which are if anything even bigger problems for shuffle, which already allows a random argument. So, why not choice and sample too? But if the only thing anyone wants from it is something that doesn’t make sense, then maybe we’re better off without it?

Hey Steven, Thank you for having taken the time to comment qualitatively my response. For the first part : I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x I didn't pay attention of the fact stated by @Richard Damon <Richard@damon-family.org> that the shape of distribution won't be preserved not only because of the argument that the distribution will goes from (0, 1] instead of [0. 1) but the pareto shape will change, and the solution shifted, scaled, and truncated Pareto make more sens For the second part But i think there is another issue to consider : for example if we want to sort : random.choice(['a', 'b', 'c'], random=lambda: random.paretovariate(1.75)), we should have a mapping between the elements of the list (here :a,b,c) to [0, 1] and the reverser mapping, having this mapping underline that we have a notion of order and distance, "measurement" between the elements, and it is not trivial to always have a notion of "order" or "distance" between the elements we manipulate, but it could be fun to have this possibility. At To answer the following questions : - What are you sorting ? - What is this reverser mapping ? - Why are you switching from probability concepts to metric theory halfway through the paragraph? - Maybe it would help if you could explain what you expected this to do. What should the odds of a, b, and c be in your function? At the best of my understanding, we define a probabilisable space by (omega, tribute, P) and *X* the random variable as the mesurable function that goes from (omega, tribute, P) to the measurable space (E, Eta) the space omega could be anything for example it could be *['a','b','c'] *but we need a function that knows how to go from subset *s *of omega (where *s* <- tribute) to a subset *s2* of E (where *s2* <- Eta) and actually by measuring, *P('a')* we are measuring *P(X^-1(b) =a)* = *Px(b)) *where (b <- Eta) As an example for bernnouli law (flipping coin for example) Ω = { ω 1 , ω 2 } ; E = {0,1} T (tribute) = P(Ω) ; P( ω 1 ) = p, P( ω 2 ) = 1 − p où p ∈]0, 1[. X( ω 1 ) = 1, X( ω 2 ) = 0. Px(1) = P(X −1 (1)) = P( ω 1 ) = p et Px(0) = P(X −1 (0)) = P( ω 2 ) = 1 − p. So the mapping function is *X *and the reverse mapping function is *X^-1* To resume the idea, the mapping *X *allow us to go from abstract elements ('a','b','c') that we do not measure directly (P(a)) to elements we can measure(Px(1)) and by having *X *we will be able to have the odds of a, b, and c so my idea was to give this function *X* as a helper function that the user should define, that will know how to map subsets of the starting space for example subsets from *ASCII* to subsets of a measurable spaces (the easiest one, would be [0,1], the measure in that space will be the identity) This is why i have bringed some metric concepts. I'm sorry i couldn't get the last part : It sounds like a good idea on the face of it. The only obvious problems are the limitations of only having a double rather than as many bits as needed, and coming from a PRNG with a limited period compared to 2^N for large N, both of which are if anything even bigger problems for shuffle, which already allows a random argument. So, why not choice and sample too? I don't know if we are better without it, but it seems to be complex for a practical daily use ! Best regards, -- SENHAJI RHAZI Hamza Le dim. 18 août 2019 à 23:16, Andrew Barnert <abarnert@yahoo.com> a écrit :

Senhaji Rhazi hamza writes:
I think there is a way to map x <- [1; + inf] to y <- [0;1] by putting y = 1/x
I don't think that's the point. I'll put money on a poor choice of value for the random parameter in the example: the OP chose the Pareto variate because that's what he mentioned earlier. But there are plenty of distributions besides uniform that can be parametrized to have support included in [0,1]. To me, it also seems a little odd that random.choices supports only discrete populations (possibly with weights) but doesn't support float populations. I think r.choices(random=lambda: random.paretovariate(1.75), k=k) would be a nice way to spell "sample with replacement from Pareto distribution of size k". But as Steven d'Aprano points out, you can also spell it [ random.paretovariate(1.75) for _ in range(k) ] so there's no real need to generalize Random.choices. On the other hand, r.sample(random=lambda: random.paretovariate(1.75), k=k) doesn't make immediate sense to me. What does "sampling without replacement" mean if you don't have an explicit population? (Note that random.sample doesn't support weights for a similar reason, although it could support counts.) So this suggests to me that the proposed ``random`` parameter to sequence methods is actually a spurious generalization, and has different interpretations for different methods. So I'm not against this proposal at this point, but I think it needs to be fleshed out more, both in terms of the principle we're trying to implement, and some more detailed examples. Steve
participants (6)
-
Andrew Barnert
-
Patryk Gałczyński
-
Richard Damon
-
Senhaji Rhazi hamza
-
Stephen J. Turnbull
-
Steven D'Aprano