[Tutor] import data (txt/csv) into list/array and manipulation
denis.spir at free.fr
Tue Nov 11 13:40:27 CET 2008
Triantafyllos Gkikopoulos a écrit :
> Thanks for the advice,
> I will do some more reading this week and look into your solution as
> well as others.
> So basicaly my data is enrichement signal on yeast genomic locations,
> and want to map this signal in respect to genes, and the averagomg
> question is so that I can averga signal if I align all the signal based
> on the start of every gene.
> I guess another solution is to create an array with zeros that covers
> then entire genome and then I replace the zeros with actual signal (int)
> values, then I be able to call for individual locations within this
> array and maybe easier to do the averaging as well based on the
> reference file.
There are numerous solutions for any problem. This one would be especially
non-pythonic, I guess ;-)
If I understand your problem:
* Locations are gene ids.
* They are key fields for your data -- strings and ints are not keys.
* There can be several data items for a unique id.
* Among the possible data, integers have to be processed (averaged).
* What about string?
If you want to be both pythonic and simple, as I see it, use a dict with
locations as keys. Now, the data seems to be mainly a list of ints. Right? So,
use a list for this, and add to it the relevant storing fields for additional
data (strings?), and the relevant method to average your integers. Example:
''' holds int values in basic list
calculates average value
stores additional string data
self.strings = strings
# record and/or return average, eg:
sum = 0.0
for i in self:
sum += i
self.avrg = sum/len(self)
gd = GeneData([1,2,3])
x = gd.pop()
gd.store_string("i'm relevant info")
print gd, gd.strings
print "average: %2.2f ; removed: %i" %(gd.average(), x)
[1, 2, 3, 4] ['string', 'data', "i'm relevant info"]
average: 2.5 ; removed: 5
> Dr Triantafyllos Gkikopoulos
>>>> spir <denis.spir at free.fr> 11/10/08 7:55 PM >>>
> trias a écrit :
> > Hi,
> > I have started learning python (any online help content suggestions
> > welcome) and want to write a couple of scripts to do simple numeric
> > calculations on array data.
> > filetype(1) I have reference files (ie file.csv) that contain three
> > with variable rows, first column is type str contains a unique
> > name, and the other two columns are int type contain two reference
> > (start,stop(genomic location reference values).
> > **maybe I should import this as dictionary list**
> > filetype(2) The other file contains signal data in three columns,
> column one
> > is a unique identifier type int, and the other two columns contain
> two type
> > int values (genomic location reference values)
> > ** import this as array/list
> For both files, field 1 contains an id. So that using a dictionary seems
> appropriate. You may use a format like:
> Location could also be stored in a custom type, especially if you need
> compare location (which is probably the case). Example (not tested):
> class Location(object):
> def __init__(self, start, stop):
> delf.start = start
> self.stop = stop
> def __eq__(self, other):
> return (self.start==other.start) and
> The second method will be called when you test loc1==loc2 and will
> return True
> iif both positions are equal.
> This custom type allows you to define other methods that be relevant for
> > I want to map the location of filetype(2) with respect to
> Here is the problem reversed: if the location is to be used as link
> tables, then it should be the key of both tables:
> Fortunately, your location is a simple enough set of data to be stored
> as a
> (start,stop) tuple, so that you can actually use it as dict key (a
> dict's key
> must be of an immutable type).
> Now, the question is: do you have multiple occurences of the same
> location. If
> yes, you will have to agglomerate the data in eg a list:
> But, maybe I don't properly undestand what you have to do (see Q below).
> > ...and be
> able to do averaging of signal if I align all filetype one objects.
> Where/what are the data fields in your pattern?
> > Thanks
> The University of Dundee is a registered Scottish charity, No: SC015096
More information about the Tutor