[CentralOH] Tab delimited data in Python

Eric Floehr eric at intellovations.com
Fri Nov 21 15:46:57 CET 2008


Bryan,

Have you thought about using the built-in CSV module?  It can take
alternative delimiters, like tabs:
http://www.python.org/doc/2.5.2/lib/module-csv.html

It reads into a list and writes a list out in the right format.

But maybe it's somewhere else in your code...if you copied intentations
right, it seems like your code wouldn't work if self.column_labels contained
more than one item.  Could you further describe what you are attempting to
do, and maybe I could help out a little more.

-Eric
http://www.forecastadvisor.com



On Fri, Nov 21, 2008 at 9:10 AM, Bryan <harrisbw at notes.udayton.edu> wrote:

> Hi all,
>
> I have written some code to do data reduction on high-strain-rate
> tensile test data.   The program goes through and does simple things
> like plotting the raw data and measuring the stroke rate.  (Believe me
> it really is simple.)
>
> I need a simple way to work with tab-delimited data (TDD).  I've written
> my own code, but it works HORRIBLY slow.  It takes something like 20
> seconds to append a column to a 6000 line data file.  But it does
> _work_.  Is there a good library for handling TDD?  It has to work on
> files with different column lengths.  Several python commands for
> splitting strings merge delimiters and this breaks TDD files with
> unequal column lengths.
>
> Now I several, possibly competing goals:
> - It has to work fairly quickly
> - It has to work on very large files, too large for excel.)  I know
> that's not really that large.)
> - it must work on both linux and windows. (I thought this was a given
> with python but I learned there are libraries available only for one or
> the other.  For instance there are windows-only excel libraries.)
>
> Here's the slow code (It's ugly I know.  I'm and Mechanical
> Engineer...):
>
>  def append_column(self,column_data,heading):
>    f = open(self.textfile, 'rU')
>    temp=tempfile.mktemp()
>    g = open(temp, 'w')
>    index=0
>    header=""
>    for label in self.column_labels:
>      header += '\t'
>      header += label
>    #strip the first tab
>      header = header[1:]+'\t'+heading.strip()+'\n'
>      #f.readline()
>      g.write(header)
>      try:
>        for line in f:
>          if index!=0 :
>            try:
>              line=line[:-1]+'\t'+str(column_data[index-1])+'\n'
>            except(IndexError):
>              line=line[:-1]+'\t''\n'
>            #if index < 25 : print line,
>              g.write(line)
>          index += 1
>      finally:
>        f.close()
>        g.close()
>      shutil.move(temp,self.textfile)
>      self.number_of_columns+=1
>      self.traces=self.get_traces()
>      self.column_lengths=self.get_column_lengths()
>
> As I said, this takes something like 20 seconds to append a normal
> column of data.  I'd rather use a library for handling this sort of
> thing than write my own, but I wouldn't mind knowing how you guys would
> tighten up this code.  I think the try-finally's are slowing this down,
> but I'm not sure.
>
> Thanks,
> Bryan
>
>
> --
> Bryan Harris
> Research Engineer
> Structures and Materials Evaluation Group
> harrisbw at notes.udayton.edu
> http://www.udri.udayton.edu/
> (937) 229-5561
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> http://mail.python.org/mailman/listinfo/centraloh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20081121/106f6af2/attachment.htm>


More information about the CentralOH mailing list