From harrisbw at notes.udayton.edu Fri Nov 21 15:10:53 2008 From: harrisbw at notes.udayton.edu (Bryan) Date: Fri, 21 Nov 2008 09:10:53 -0500 Subject: [CentralOH] Tab delimited data in Python Message-ID: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> Hi all, I have written some code to do data reduction on high-strain-rate tensile test data. The program goes through and does simple things like plotting the raw data and measuring the stroke rate. (Believe me it really is simple.) I need a simple way to work with tab-delimited data (TDD). I've written my own code, but it works HORRIBLY slow. It takes something like 20 seconds to append a column to a 6000 line data file. But it does _work_. Is there a good library for handling TDD? It has to work on files with different column lengths. Several python commands for splitting strings merge delimiters and this breaks TDD files with unequal column lengths. Now I several, possibly competing goals: - It has to work fairly quickly - It has to work on very large files, too large for excel.) I know that's not really that large.) - it must work on both linux and windows. (I thought this was a given with python but I learned there are libraries available only for one or the other. For instance there are windows-only excel libraries.) Here's the slow code (It's ugly I know. I'm and Mechanical Engineer...): def append_column(self,column_data,heading): f = open(self.textfile, 'rU') temp=tempfile.mktemp() g = open(temp, 'w') index=0 header="" for label in self.column_labels: header += '\t' header += label #strip the first tab header = header[1:]+'\t'+heading.strip()+'\n' #f.readline() g.write(header) try: for line in f: if index!=0 : try: line=line[:-1]+'\t'+str(column_data[index-1])+'\n' except(IndexError): line=line[:-1]+'\t''\n' #if index < 25 : print line, g.write(line) index += 1 finally: f.close() g.close() shutil.move(temp,self.textfile) self.number_of_columns+=1 self.traces=self.get_traces() self.column_lengths=self.get_column_lengths() As I said, this takes something like 20 seconds to append a normal column of data. I'd rather use a library for handling this sort of thing than write my own, but I wouldn't mind knowing how you guys would tighten up this code. I think the try-finally's are slowing this down, but I'm not sure. Thanks, Bryan -- Bryan Harris Research Engineer Structures and Materials Evaluation Group harrisbw at notes.udayton.edu http://www.udri.udayton.edu/ (937) 229-5561 From gacsinger at gmail.com Fri Nov 21 15:35:07 2008 From: gacsinger at gmail.com (Greg Singer) Date: Fri, 21 Nov 2008 09:35:07 -0500 Subject: [CentralOH] Tab delimited data in Python In-Reply-To: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> References: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> Message-ID: Bryan, If you haven't already done so, have a look at the csv module in the standard lib (http://docs.python.org/library/csv.html). Despite the name, it can handle tab-delimited files. - Greg On Fri, Nov 21, 2008 at 9:10 AM, Bryan wrote: > Hi all, > > I have written some code to do data reduction on high-strain-rate > tensile test data. The program goes through and does simple things > like plotting the raw data and measuring the stroke rate. (Believe me > it really is simple.) > > I need a simple way to work with tab-delimited data (TDD). I've written > my own code, but it works HORRIBLY slow. It takes something like 20 > seconds to append a column to a 6000 line data file. But it does > _work_. Is there a good library for handling TDD? It has to work on > files with different column lengths. Several python commands for > splitting strings merge delimiters and this breaks TDD files with > unequal column lengths. > > Now I several, possibly competing goals: > - It has to work fairly quickly > - It has to work on very large files, too large for excel.) I know > that's not really that large.) > - it must work on both linux and windows. (I thought this was a given > with python but I learned there are libraries available only for one or > the other. For instance there are windows-only excel libraries.) > > Here's the slow code (It's ugly I know. I'm and Mechanical > Engineer...): > > def append_column(self,column_data,heading): > f = open(self.textfile, 'rU') > temp=tempfile.mktemp() > g = open(temp, 'w') > index=0 > header="" > for label in self.column_labels: > header += '\t' > header += label > #strip the first tab > header = header[1:]+'\t'+heading.strip()+'\n' > #f.readline() > g.write(header) > try: > for line in f: > if index!=0 : > try: > line=line[:-1]+'\t'+str(column_data[index-1])+'\n' > except(IndexError): > line=line[:-1]+'\t''\n' > #if index < 25 : print line, > g.write(line) > index += 1 > finally: > f.close() > g.close() > shutil.move(temp,self.textfile) > self.number_of_columns+=1 > self.traces=self.get_traces() > self.column_lengths=self.get_column_lengths() > > As I said, this takes something like 20 seconds to append a normal > column of data. I'd rather use a library for handling this sort of > thing than write my own, but I wouldn't mind knowing how you guys would > tighten up this code. I think the try-finally's are slowing this down, > but I'm not sure. > > Thanks, > Bryan > > > -- > Bryan Harris > Research Engineer > Structures and Materials Evaluation Group > harrisbw at notes.udayton.edu > http://www.udri.udayton.edu/ > (937) 229-5561 > > _______________________________________________ > CentralOH mailing list > CentralOH at python.org > http://mail.python.org/mailman/listinfo/centraloh > From harrisbw at notes.udayton.edu Fri Nov 21 16:25:56 2008 From: harrisbw at notes.udayton.edu (Bryan) Date: Fri, 21 Nov 2008 10:25:56 -0500 Subject: [CentralOH] Tab delimited data in Python In-Reply-To: <34f468870811210646m19ae8f97k20258d64f6f2b555@mail.gmail.com> References: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> <34f468870811210646m19ae8f97k20258d64f6f2b555@mail.gmail.com> Message-ID: <1227281156.8393.22.camel@bryan-udri.udri.udayton.edu> I found the real killer. It wasn't in this code after all. The command self.get_traces() was re-reading the whole file each time I appended a column. Stupid. The csv module was exactly what I was looking for! Thanks! When I searched, the name threw me off. On Fri, 2008-11-21 at 09:46 -0500, Eric Floehr wrote: > self.traces=self.get_traces() -- Bryan Harris Research Engineer Structures and Materials Evaluation Group harrisbw at notes.udayton.edu http://www.udri.udayton.edu/ (937) 229-5561 From eric at intellovations.com Fri Nov 21 15:46:57 2008 From: eric at intellovations.com (Eric Floehr) Date: Fri, 21 Nov 2008 09:46:57 -0500 Subject: [CentralOH] Tab delimited data in Python In-Reply-To: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> References: <1227276653.8393.20.camel@bryan-udri.udri.udayton.edu> Message-ID: <34f468870811210646m19ae8f97k20258d64f6f2b555@mail.gmail.com> Bryan, Have you thought about using the built-in CSV module? It can take alternative delimiters, like tabs: http://www.python.org/doc/2.5.2/lib/module-csv.html It reads into a list and writes a list out in the right format. But maybe it's somewhere else in your code...if you copied intentations right, it seems like your code wouldn't work if self.column_labels contained more than one item. Could you further describe what you are attempting to do, and maybe I could help out a little more. -Eric http://www.forecastadvisor.com On Fri, Nov 21, 2008 at 9:10 AM, Bryan wrote: > Hi all, > > I have written some code to do data reduction on high-strain-rate > tensile test data. The program goes through and does simple things > like plotting the raw data and measuring the stroke rate. (Believe me > it really is simple.) > > I need a simple way to work with tab-delimited data (TDD). I've written > my own code, but it works HORRIBLY slow. It takes something like 20 > seconds to append a column to a 6000 line data file. But it does > _work_. Is there a good library for handling TDD? It has to work on > files with different column lengths. Several python commands for > splitting strings merge delimiters and this breaks TDD files with > unequal column lengths. > > Now I several, possibly competing goals: > - It has to work fairly quickly > - It has to work on very large files, too large for excel.) I know > that's not really that large.) > - it must work on both linux and windows. (I thought this was a given > with python but I learned there are libraries available only for one or > the other. For instance there are windows-only excel libraries.) > > Here's the slow code (It's ugly I know. I'm and Mechanical > Engineer...): > > def append_column(self,column_data,heading): > f = open(self.textfile, 'rU') > temp=tempfile.mktemp() > g = open(temp, 'w') > index=0 > header="" > for label in self.column_labels: > header += '\t' > header += label > #strip the first tab > header = header[1:]+'\t'+heading.strip()+'\n' > #f.readline() > g.write(header) > try: > for line in f: > if index!=0 : > try: > line=line[:-1]+'\t'+str(column_data[index-1])+'\n' > except(IndexError): > line=line[:-1]+'\t''\n' > #if index < 25 : print line, > g.write(line) > index += 1 > finally: > f.close() > g.close() > shutil.move(temp,self.textfile) > self.number_of_columns+=1 > self.traces=self.get_traces() > self.column_lengths=self.get_column_lengths() > > As I said, this takes something like 20 seconds to append a normal > column of data. I'd rather use a library for handling this sort of > thing than write my own, but I wouldn't mind knowing how you guys would > tighten up this code. I think the try-finally's are slowing this down, > but I'm not sure. > > Thanks, > Bryan > > > -- > Bryan Harris > Research Engineer > Structures and Materials Evaluation Group > harrisbw at notes.udayton.edu > http://www.udri.udayton.edu/ > (937) 229-5561 > > _______________________________________________ > CentralOH mailing list > CentralOH at python.org > http://mail.python.org/mailman/listinfo/centraloh > -------------- next part -------------- An HTML attachment was scrubbed... URL: