choose value from custom distribution

Chris Rebert clp2 at rebertia.com
Tue Oct 19 08:49:15 CEST 2010


On Mon, Oct 18, 2010 at 11:40 PM, Arnaud Delobelle <arnodel at gmail.com> wrote:
> elsa <kerensaelise at hotmail.com> writes:
>> Hello,
>>
>> I'm trying to find a way to collect a set of values from real data,
>> and then sample values randomly from this data - so, the data I'm
>> collecting becomes a kind of probability distribution. For instance, I
>> might have age data for some children. It's very easy to collect this
>> data using a list, where the index gives the value of the data, and
>> the number in the list gives the number of times that values occurs:
>>
>> [0,0,10,20,5]
>>
>> could mean that there are no individuals that are no people aged 0, no
>> people aged 1, 10 people aged 2, 20 people aged 3, and 5 people aged 4
>> in my data collection.
>>
>> I then want to make a random sample that would be representative of
>> these proportions - is there any easy and fast way to select an entry
>> weighted by its value? Or are there any python packages that allow you
>> to easily create your own distribution based on collected data?
<snip>
> If you want to keep it simple, you can do:
>
>>>> t = [0,0,10,20,5]
>>>> expanded = sum([[x]*f for x, f in enumerate(t)], [])
>>>> random.sample(expanded, 10)
> [3, 2, 2, 3, 2, 3, 2, 2, 3, 3]
>>>> random.sample(expanded, 10)
> [3, 3, 4, 3, 2, 3, 3, 3, 2, 2]
>>>> random.sample(expanded, 10)
> [3, 3, 3, 3, 3, 2, 3, 2, 2, 3]
>
> Is that what you need?

The OP explicitly ruled that out:

>> Two
>> other things to bear in mind are that in reality I'm collating data
>> from up to around 5 million individuals, so just making one long list
>> with a new entry for each individual won't work.

Cheers,
Chris
--
The internet is wrecking people's attention spans and reading comprehension.



More information about the Python-list mailing list