[Tutor] import data (txt/csv) into list/array and manipulation

Tue Nov 11 13:40:27 CET 2008

Triantafyllos Gkikopoulos a écrit :
 > Hi,
 >
 >  Thanks for the advice,
 >
 > I will do some more reading this week and look into your solution as
 > well as others.
 >
 >   So basicaly my data is enrichement signal on yeast genomic locations,
 > and want to map this signal in respect to genes, and the averagomg
 > question is so that I can averga signal if I align all the signal based
 > on the start of every gene.
 > I guess another solution is to create an array with zeros that covers
 > then entire genome and then I replace the zeros with actual signal (int)
 > values, then I be able to call for individual locations within this
 > array and maybe easier to do the averaging as well based on the
 > reference file.

There are numerous solutions for any problem. This one would be especially 
non-pythonic, I guess ;-)
If I understand your problem:
* Locations are gene ids.
* They are key fields for your data -- strings and ints are not keys.
* There can be several data items for a unique id.
* Among the possible data, integers have to be processed (averaged).
* What about string?

If you want to be both pythonic and simple, as I see it, use a dict with 
locations as keys. Now, the data seems to be mainly a list of ints. Right? So, 
use a list for this, and add to it the relevant storing fields for additional 
data (strings?), and the relevant method to average your integers. Example:

class GeneData(list):
	''' holds int values in basic list
		calculates average value
		stores additional string data
		'''
	def store_strings(self,strings):
		self.strings = strings
	def store_string(self,string):
		self.strings.append(string)
	def average(self):
		# record and/or return average, eg:
		sum = 0.0
		for i in self:
			sum += i
		self.avrg = sum/len(self)
		return self.avrg

gd = GeneData([1,2,3])
gd.append(4), gd.append(5)
x = gd.pop()
gd.store_strings(["string","data"])
gd.store_string("i'm relevant info")

print gd, gd.strings
print "average: %2.2f ; removed: %i" %(gd.average(), x)
==>
[1, 2, 3, 4] ['string', 'data', "i'm relevant info"]
average: 2.5 ; removed: 5

denis

 > cheers
 >
 > Dr Triantafyllos Gkikopoulos
 >>>> spir <denis.spir at free.fr> 11/10/08 7:55 PM >>>
 > trias a écrit :
 >  > Hi,
 >  >
 >  >  I have started learning python (any online help content suggestions
 > are
 >  > welcome) and want to write a couple of scripts to do simple numeric
 >  > calculations on array data.
 >  >
 >  > filetype(1) I have reference files (ie file.csv) that contain three
 > columns
 >  > with variable rows, first column is type str contains a unique
 > identifier
 >  > name, and the other two columns are int type contain two reference
 > values
 >  > (start,stop(genomic location reference values).
 >  >   **maybe I should import this as dictionary list**
 >  >
 >  > filetype(2) The other file contains signal data in three columns,
 > column one
 >  > is a unique identifier type int, and the other two columns contain
 > two type
 >  > int values (genomic location reference values)
 >  >   ** import this as array/list
 >
 > For both files, field 1 contains an id. So that using a dictionary seems
 >
 > appropriate. You may use a format like:
 > {id:(start,stop)}
 > Location could also be stored in a custom type, especially if you need
 > to
 > compare location (which is probably the case). Example (not tested):
 > class Location(object):
 > 	def __init__(self, start, stop):
 > 		delf.start = start
 > 		self.stop = stop
 > 	def __eq__(self, other):
 > 		return (self.start==other.start) and
 > (self.stop==other.stop)
 > The second method will be called when you test loc1==loc2 and will
 > return True
 > iif both positions are equal.
 > This custom type allows you to define other methods that be relevant for
 > your
 > problem.
 >  > I want to map the location of filetype(2) with respect to
 > filetype(1)...
 >
 > Here is the problem reversed: if the location is to be used as link
 > between
 > tables, then it should be the key of both tables:
 > {location:id}
 > Fortunately, your location is a simple enough set of data to be stored
 > as a
 > (start,stop) tuple, so that you can actually use it as dict key (a
 > dict's key
 > must be of an immutable type).
 > Now, the question is: do you have multiple occurences of the same
 > location. If
 > yes, you will have to agglomerate the data in eg a list:
 > {location:[d1,d2,...]}
 > But, maybe I don't properly undestand what you have to do (see Q below).
 >  > ...and be
 > able to do averaging of signal if I align all filetype one objects.
 >
 > Where/what are the data fields in your pattern?
 >
 > Denis
 >
 >  > Thanks
 >
 >
 >
 >
 > The University of Dundee is a registered Scottish charity, No: SC015096
 >
 >