[Tutor] Get a single random sample
Steven D'Aprano
steve at pearwood.info
Fri Sep 9 14:44:43 CEST 2011
kitty wrote:
> I'm new to python and I have read through the tutorial on:
> http://docs.python.org/tutorial/index.html
> which was really good, but I have been an R user for 7 years and and am
> finding it difficult to do even basic things in python, for example I want
> to import my data (a tab-delimited .txt file) so that I can index and select
> a random sample of one column based on another column. my data has
> 2 columns named 'area' and 'change.dens'.
>
> In R I would just
>
> data<-read.table("FILE PATH\\Road.density.municipio.all.txt", header=T)
> #header =T gives colums their headings so that I can call each individually
> names(data)
> attach(data)
>
> Then to Index I would simply:
> subset<-change.dens[area<2000&area>700] # so return change.dens values that
> have corresponding 'area's of between 700 and 2000
>
> then to randomly sample a value from that I just need to
> random<-sample(subset,1)
>
>
> My question is how do I get python to do this???
Good question! This does look like something where R is easier to use
than Python, especially with the table() function doing most of the work
for you.
Here's one way to do it in Python.
# Open the file and read two tab-delimited columns.
# Note that there is minimal error checking here.
f = open('Road.density.municipio.all.txt')
data = []
for row in f:
if not row.strip():
# Skip blank lines.
continue
area, dens = row.split('\t') # Split into two columns at tab
pair = (float(area), float(dens))
data.append(pair)
f.close() # Close the file when done.
# Select items with specified areas.
subset = [pair for pair in data if 700 < pair[0] < 2000]
# Get a single random sample.
import random
sample = random.choice(subset)
# Get ten random samples, sampling with replacement.
samples = [random.choice(subset) for i in range(10)]
# Get ten random samples, without replacement.
copy = subset[:]
random.shuffle(copy)
samples = copy[:10]
--
Steven
More information about the Tutor
mailing list