data management with python from perl

ben moretti bmoretti at chariot.net.au
Tue Oct 7 21:25:14 EDT 2003


hi

i'm learning python, and one area i'd use it for is data management in
scientific computing. in the case i've tried i want to reformat a data
file from a normalised list to a matrix with some sorted columns. to
do this at the moment i am using perl, which is very easy to do, and i
want to see if python is as easy.

so, the data i am using is some epiphyte population abundance data for
particular sites, and it looks like this:

1.00	1.00	1.00	"MO"	906.00	"genus species 1"	1.00
1.00	1.00	1.00	"MO"	906.00	"genus species 2"	1.00
1.00	1.00	1.00	"MO"	906.00	"genus species 3"	1.00
1.00	1.00	1.00	"MO"	906.00	"genus species 4"	1.00

(i have changed the data to protect the innocent) the first four
columns relate to the location, the fifth to the substrate, the sixth
is the epiphyte species and the seventh the abundance. i need to turn
this into a substrate x species matrix with columns 1 to 4 retained as
sorting columns and the intersection of speces and substrate being the
abundance. the species name needs to be the column headers. this is
going to go into a multivariate analysis of variance programme that
only takes its data in that format. here is an example of the output

region	location	site	stand	substrate	genus species 1	genus species
2	genus species 3	genus species 4	genus species 5	genus species
6	genus species 7

<..etc..>

1	1	1	MO	906	0	0	0	0	0	0	0	0	0	0	0	0	0	0

<..etc...>

so, to do this in perl - and i won't bore you with the whole script -
i read the file, split it into tokens and then populate a hash of
hashes, the syntax of which is

$HoH{$tokens[0]}{$tokens[1]}{$tokens[2]}{$tokens[3]}{$tokens[4]}{$tokens[5]}
= $tokens[6]

with the various location and species values are the keys of the hash,
and the abundance is the $tokens[6] value. this now gives me a
multidimensional data structure that i can use to loop over the keys
and sort them by each as i go, then to write out the data into a
matrix as above. the syntax for this is generally like

# level 1 - region
foreach $region (sort {$a <=> $b} keys %HoH) {

	# level 2 - location 
	foreach $location (sort {$a <=> $b} keys %{ $HoH{$region} }) {
		
		# level 3 - site
		foreach $site (sort {$a <=> $b} keys %{ $HoH{$region}{$location} }) 

<... etc ...>

there is a bit more perl obviously, but that is the general gist of
it. multidimensional hash and then looping and sorting to get the data
out.

ok. so how do i do this in python? i've tried the "perlish" way but
didn't get very far, however i know it must be able to be done!

if you want to respond to this, try benmoretti at yahoo dot com dot au
as i get too much spam otherwise

cheers

ben




More information about the Python-list mailing list