Opposite of split
Stefan Schwarzer
sschwarzer at sschwarzer.net
Tue Aug 17 10:14:05 EDT 2010
Hi Alex,
On 2010-08-16 18:44, Alex van der Spek wrote:
> Anybody catches any other ways to improve my program (attached), you are
> most welcome. Help me learn, that is one of the objectives of this
> newsgroup, right? Or is it all about exchanging the next to impossible
> solution to the never to happen unreal world problems?
I don't know what a concordance table is, and I haven't
looked a lot into your program, but anyway here are some
things I noticed at a glance:
| #! usr/bin/env python
| # Merge log files to autolog file
| import os
| import fileinput
| #top='C:\\Documents and Settings\\avanderspek\\My Documents\\CiDRAdata\\Syncrude\\CSL\\August2010'
| top='C:\\Users\\ZDoor\\Documents\\CiDRA\\Syncrude\CSL\\August2010'
If you have backslashes in strings, you might want to use
"raw strings". Instead of "c:\\Users\\ZDoor" you'd write
r"c:\Users\ZDoor" (notice the r in front of the string).
| i,j,k=0,0,0
| date={}
I suggest to use more spacing to make the code more
readable. Have a look at
http://www.python.org/dev/peps/pep-0008/
for more formatting (and other) tips.
| fps=0.3048
| tab='\t'
|
| bt='-999.25'+'\t''-999.25'+'\t''-999.25'+'\t''-999.25'+'\t'+'-999.25'
If these numbers are always the same, you should use
something like
NUMBER = "-999.25"
COLUMNS = 5
bt = "\t".join(COLUMNS * [NUMBER])
(with better naming, of course).
Why don't you use `tab` here?
I _highly_ recommend to use longer (unabbreviated) names.
| al='Status'+'\t'+'State'+'\t'+'-999.25'
|
| for root,dirs,files in os.walk(top):
| #Build a concordance table of days on which data was collected
| for name in files:
| ext=name.split('.',1)[1]
There's a function `splitext` in `os.path`.
| if ext=='txt':
| dat=name.split('_')[1].split('y')[1]
| if dat in date.keys():
You can just write `if dat in date` (in Python versions >=
2.2, I think).
| date[dat]+=1
| else:
| date[dat]=1
| print 'Concordance table of days:'
| print date
| print 'List of files processed:'
| #Build a list of same day filenames, 5 max for a profile meter,skip first and last days
| for f in sorted(date.keys())[2:-1]:
| logs=[]
| for name in files:
| ext=name.split('.')[1]
| if ext=='txt':
| dat=name.split('_')[1].split('y')[1]
I guess I'd move the parsing stuff (`x.split(s)[i]` etc.)
into small functions with meaningful names. After that I'd
probably notice there's much redundancy and refactor them. ;)
| if dat==f:
| logs.append(os.path.join(root,name))
| #Open the files and read line by line
| datsec=False
| lines=[[] for i in range(5)]
One thing to watch out for: The above is different from
`[[]] * 5` which uses the _same_ empty list for all entries.
Probably the semantics you chose is correct.
| fn=0
| for line in fileinput.input(logs):
| if line.split()[0]=='DataID':
| datsec=True
| ln=0
| if datsec:
| lines[fn].append(line.split())
| ln+=1
| if ln==10255:
This looks like a "magic number" and should be turned into a
constant.
| datsec=False
| fileinput.nextfile()
| fn+=1
| print fileinput.filename().rsplit('\\',1)[1]
| fileinput.close()
| aut='000_AutoLog'+f+'.log'
| out=os.path.join(root,aut)
| alf=open(out,'w')
| alf.write('Timestamp (mm/dd/yyyy hh:mm:ss) VF 1 VF 2 VF 3 VF 4 VF 5 Q 1 Q 2 Q 3 Q 4 Q 5 Status State Metric Band Temperature 1 Band Temperature 2 Band Temperature 3 Band Temperature 4 Band Temperature 5 SPL 1 SPL 2 SPL 3 SPL 4 SPL 5'+'\n')
| for wn in range(1,10255,1):
You don't need to write the step argument if it's 1.
| for i in range(5):
| lines[i][wn][2]=str(float(lines[i][wn][2])/fps)
| tp=lines[0][wn][0]+' '+lines[0][wn][1]
| vf=tab.join([lines[i][wn][2] for i in range(5)])
| vq=tab.join([lines[i][wn][3] for i in range(5)])
| vs=tab.join([lines[i][wn][4] for i in range(5)])
| #sf=tab.join([lines[i][wn][5] for i in range(5)])
| #sq=tab.join([lines[i][wn][6] for i in range(5)])
| #ss=tab.join([lines[i][wn][7] for i in range(5)])
Maybe use an extra function?
def choose_a_better_name():
return tab.join([lines[index][wn][2] for index in range(5)])
Moreover, the repetition of this line looks as if you wanted
to put the right hand sides of the assignments in a list,
instead of assigning to distinct names (`vf` etc.).
By the way, you use the number 5 a lot. I guess this should
be a constant, too.
| alf.write(tp+'\t'+vf+'\t'+vq+'\t'+al+'\t'+bt+'\t'+vs+'\n')
Suggestion: Use
tab.join([tp, vf, vq, al, bt, vs]) + "\n"
Again, not using distinct variables would have an advantage
here.
| alf.close()
| print "Done"
Stefan
More information about the Python-list
mailing list