[CentralOH] Tab delimited data in Python
Eric Floehr
eric at intellovations.com
Fri Nov 21 15:46:57 CET 2008
Bryan,
Have you thought about using the built-in CSV module? It can take
alternative delimiters, like tabs:
http://www.python.org/doc/2.5.2/lib/module-csv.html
It reads into a list and writes a list out in the right format.
But maybe it's somewhere else in your code...if you copied intentations
right, it seems like your code wouldn't work if self.column_labels contained
more than one item. Could you further describe what you are attempting to
do, and maybe I could help out a little more.
-Eric
http://www.forecastadvisor.com
On Fri, Nov 21, 2008 at 9:10 AM, Bryan <harrisbw at notes.udayton.edu> wrote:
> Hi all,
>
> I have written some code to do data reduction on high-strain-rate
> tensile test data. The program goes through and does simple things
> like plotting the raw data and measuring the stroke rate. (Believe me
> it really is simple.)
>
> I need a simple way to work with tab-delimited data (TDD). I've written
> my own code, but it works HORRIBLY slow. It takes something like 20
> seconds to append a column to a 6000 line data file. But it does
> _work_. Is there a good library for handling TDD? It has to work on
> files with different column lengths. Several python commands for
> splitting strings merge delimiters and this breaks TDD files with
> unequal column lengths.
>
> Now I several, possibly competing goals:
> - It has to work fairly quickly
> - It has to work on very large files, too large for excel.) I know
> that's not really that large.)
> - it must work on both linux and windows. (I thought this was a given
> with python but I learned there are libraries available only for one or
> the other. For instance there are windows-only excel libraries.)
>
> Here's the slow code (It's ugly I know. I'm and Mechanical
> Engineer...):
>
> def append_column(self,column_data,heading):
> f = open(self.textfile, 'rU')
> temp=tempfile.mktemp()
> g = open(temp, 'w')
> index=0
> header=""
> for label in self.column_labels:
> header += '\t'
> header += label
> #strip the first tab
> header = header[1:]+'\t'+heading.strip()+'\n'
> #f.readline()
> g.write(header)
> try:
> for line in f:
> if index!=0 :
> try:
> line=line[:-1]+'\t'+str(column_data[index-1])+'\n'
> except(IndexError):
> line=line[:-1]+'\t''\n'
> #if index < 25 : print line,
> g.write(line)
> index += 1
> finally:
> f.close()
> g.close()
> shutil.move(temp,self.textfile)
> self.number_of_columns+=1
> self.traces=self.get_traces()
> self.column_lengths=self.get_column_lengths()
>
> As I said, this takes something like 20 seconds to append a normal
> column of data. I'd rather use a library for handling this sort of
> thing than write my own, but I wouldn't mind knowing how you guys would
> tighten up this code. I think the try-finally's are slowing this down,
> but I'm not sure.
>
> Thanks,
> Bryan
>
>
> --
> Bryan Harris
> Research Engineer
> Structures and Materials Evaluation Group
> harrisbw at notes.udayton.edu
> http://www.udri.udayton.edu/
> (937) 229-5561
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> http://mail.python.org/mailman/listinfo/centraloh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20081121/106f6af2/attachment.htm>
More information about the CentralOH
mailing list