[Tutor] Get a single random sample

Fri Sep 9 14:44:43 CEST 2011

kitty wrote:

> I'm new to python and I have read through the tutorial on:
> http://docs.python.org/tutorial/index.html
> which was really good, but I have been an R user for 7 years and and am
> finding it difficult to do even basic things in python, for example I want
> to import my data (a tab-delimited .txt file) so that I can index and select
> a random sample of one column based on another column. my data has
> 2 columns named 'area' and 'change.dens'.
> 
> In R I would just
> 
> data<-read.table("FILE PATH\\Road.density.municipio.all.txt", header=T)
> #header =T gives colums their headings so that I can call each individually
> names(data)
> attach(data)
> 
> Then to Index I would simply:
> subset<-change.dens[area<2000&area>700] # so return change.dens values that
> have corresponding 'area's of between 700 and 2000
> 
> then to randomly sample a value from that I just need to
> random<-sample(subset,1)
> 
> 
> My question is how do I get python to do this???

Good question! This does look like something where R is easier to use 
than Python, especially with the table() function doing most of the work 
for you.

Here's one way to do it in Python.

# Open the file and read two tab-delimited columns.
# Note that there is minimal error checking here.
f = open('Road.density.municipio.all.txt')
data = []
for row in f:
     if not row.strip():
         # Skip blank lines.
         continue
     area, dens = row.split('\t')  # Split into two columns at tab
     pair = (float(area), float(dens))
     data.append(pair)

f.close()  # Close the file when done.

# Select items with specified areas.
subset = [pair for pair in data if 700 < pair[0] < 2000]

# Get a single random sample.
import random
sample = random.choice(subset)

# Get ten random samples, sampling with replacement.
samples = [random.choice(subset) for i in range(10)]

# Get ten random samples, without replacement.
copy = subset[:]
random.shuffle(copy)
samples = copy[:10]

-- 
Steven