[Tutor] first program
Bob Gailer
bgailer@alum.rpi.edu
Tue Jun 24 13:08:02 2003
--=======56DC6656=======
Content-Type: multipart/alternative; x-avg-checked=avg-ok-1B0B3AF9; boundary="=====================_12467297==.ALT"
--=====================_12467297==.ALT
Content-Type: text/plain; x-avg-checked=avg-ok-1B0B3AF9; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit
At 02:29 AM 6/24/2003 +0100, mark boydell wrote:
>[snip]
>As a developmental psychologist, I often have to calculate a child's age
>in months
>from their DOB. To do this I look at the date of the testing and the
>child's DOB
>and w/ some maths I round it up to the closest month age.
>so my program aims to take a file made up of the date of the experiment
>(date:DDMMYYYY)
>and a list of DOBs (in the DDMMYYYY format) and write the results to an
>output file.
>
>Having written and tested the program it seems to work fine but I've got some
>niggling doubt that I may be breaking some basic rules:
>1. am I overusing the global scope?
>2. should I make it more OOP? by making it all into a class?
>3. is the general programming style dreadful ;) ?
>[snip]
>the input file should look like this:
>
>date:20062003
>23111996
>20021992
>03121996
>....
>[snip]
Several more ideas:
1) start learning how to use regular expressions (re) to parse input. In
this case it might seem like overkill, but re is a very powerful tool and
this can be a good time and place to get familiar with it:
import re
pat = r'(\D*)(\d{2,2})(\d{2,2})(\d{4,4})'
parsedDate = re.findall(pat, '12312003') # returns [('', '12', '31', '2003')]
parsedDate = re.findall(pat, 'date:12312003') # returns [('date:', '12',
'31', '2003')]
The pattern elements:
(\D*) match an optional sequence of nondigits and treat as a group
(\d{2,2}) match a sequence of 2 digits and treat as a group
etc.
Each group becomes an element in the result tuple.
2) take advantage of built-in functions:
line = line.replace('\n','') # strip any \n
3) process data in lists rather than in separate variables especially when
anticipating functions that expect sequences
date = parsedDate[0] # extract tuple from list; returns {'12', '31', '2003'}
dateType = parsedDate[0] # 'date:' or ''
yyyymmddDate = [date[3], date[2], date[1], 0, 0, 0, 0, 0, 0] # rearrange
and extend for mktime
# you now have a list in form ['2003', '12', '31', 0, 0, 0, 0, 0, 0]
4) use functions that apply other functions to sequences:
expDate = map(int, listDate) # returns [2003, 12, 31, 0, 0, 0, 0, 0, 0]
or dobDate, depending on dateType
5) for date and time processing consider the time module. If you can
tolerate a slight rounding error (since # of days in month varies) consider:
import time
expDay = time.mktime(expDate) / 86400 # mktime gives you seconds since 1/1/1970
dobDay = time.mktime(dobDate) / 86400 # divide by 60*60*24 for days
monthDif = round((dobDay - expDat)/30.4375) # divide by average # days in
month (365.25/12)
6) take advantage of % formatting:
outp.write('%s=%s\n'%(line, monthDif))
7) write your program design in pseudocode:
set things up
for each line in input
parse line
if experiment date
save expdate
else
convert dob, expdate to months
write dob, months to output
The idea here is to envision the flow of the program and data in its
simplest form, without worrying about syntax or details.
8) assemble the above ideas into a program:
import re, time
pat = r'(\D*)(\d{2,2})(\d{2,2})(\d{4,4})'
fileName = 'j:\\samis\\python\\dates.txt' # raw_input("file to use? :")
inp = open(fileName, "r")
outp = open("DOBdata.out", "w")
for line in inp.readlines():
line = line.replace('\n','') # strip any \n
parsedDate = re.findall(pat, line)
listDate = list(parsedDate[0]) # extract tuple from list and convert to
list; returns ['12', '31', '2003']
dateType = listDate[0]
yyyymmddDate = [listDate[3],listDate[2],listDate[1],0,0,0,0,0,0] #
rearrange and extend
intDate = map(int, yyyymmddDate)
day = time.mktime(intDate) / 86400
if dateType == 'date:':
expDay = day
else:
monthDif = round((expDay - day)/365.25*12)
outp.write('%s=%s\n'%(line, monthDif))
9) also note that Python has a nice expression for testing if a value is in
a range. Instead of:
dob[0]<16) and (dob[0]>-16
you can use:
-16 < dob[0] < 16
Bob Gailerbgailer@alum.rpi.edu
303 442 2625
--=====================_12467297==.ALT
Content-Type: text/html; x-avg-checked=avg-ok-1B0B3AF9; charset=us-ascii
Content-Transfer-Encoding: 8bit
<html>
<body>
At 02:29 AM 6/24/2003 +0100, mark boydell wrote:<br>
<blockquote type=cite class=cite cite>[snip]<br>
<pre>As a developmental psychologist, I often have to calculate a child's
age in months
from their DOB. To do this I look at the date of the testing and the
child's DOB
and w/ some maths I round it up to the closest month age.
so my program aims to take a file made up of the date of the experiment
(date:DDMMYYYY)
and a list of DOBs (in the DDMMYYYY format) and write the results to an
output file.
Having written and tested the program it seems to work fine but I've got
some
niggling doubt that I may be breaking some basic rules:
1. am I overusing the global scope?
2. should I make it more OOP? by making it all into a class?
3. is the general programming style dreadful ;) ?
</pre>[snip]<br>
<pre>the input file should look like this:
date:20062003
23111996
20021992
03121996
....
</pre>[snip]</blockquote><br>
Several more ideas:<br><br>
1) start learning how to use regular expressions (re) to parse input. In
this case it might seem like overkill, but re is a very powerful tool and
this can be a good time and place to get familiar with it:<br><br>
<tt>import re<br>
pat = r'(\D*)(\d{2,2})(\d{2,2})(\d{4,4})'<br>
parsedDate = re.findall(pat, '12312003') # returns [('', '12', '31',
'2003')]<br>
parsedDate = re.findall(pat, 'date:12312003') # returns [('date:', '12',
'31', '2003')]<br><br>
</tt>The pattern elements: <br>
<tt>(\D*) <x-tab> </x-tab></tt>match an optional sequence of
nondigits and treat as a group<br>
<tt>(\d{2,2})
<x-tab> </x-tab></tt>match a sequence
of 2 digits and treat as a group<br>
etc.<br>
Each group becomes an element in the result tuple.<br><br>
2) take advantage of built-in functions:<br><br>
<tt>line = line.replace('\n','') # strip any \n<br><br>
</tt>3) process data in lists rather than in separate variables
especially when anticipating functions that expect sequences<br><br>
<tt>date = parsedDate[0] # extract tuple from list; returns {'12', '31',
'2003'}<br>
dateType = parsedDate[0] # 'date:' or ''<br>
yyyymmddDate = [date[3], date[2], date[1], 0, 0, 0, 0, 0, 0] # rearrange
and extend for mktime<br>
# you now have a list in form ['2003', '12', '31', 0, 0, 0, 0, 0,
0]<br><br>
</tt>4) use functions that apply other functions to sequences:<br><br>
<tt>expDate = map(int, listDate) # returns [2003, 12, 31, 0, 0, 0, 0, 0,
0]<br><br>
</tt>or dobDate, depending on dateType<br><br>
5) for date and time processing consider the time module. If you can
tolerate a slight rounding error (since # of days in month varies)
consider:<br><br>
<tt>import time<br>
expDay = time.mktime(expDate) / 86400 # mktime gives you seconds since
1/1/1970<br>
dobDay = time.mktime(dobDate) / 86400 # divide by 60*60*24 for days<br>
monthDif = round((dobDay - expDat)/30.4375) # divide by average # days in
month (365.25/12)<br><br>
</tt>6) take advantage of % formatting:<br><br>
<tt>outp.write('%s=%s\n'%(line, monthDif))<br><br>
</tt>7) write your program design in pseudocode:<br><br>
set things up<br>
for each line in input<br>
parse line<br>
if experiment date<br>
save expdate<br>
else<br>
convert dob, expdate to months<br>
write dob, months to output<br><br>
The idea here is to envision the flow of the program and data in its
simplest form, without worrying about syntax or details.<br><br>
8) assemble the above ideas into a program:<br><br>
<tt>import re, time<br>
pat = r'(\D*)(\d{2,2})(\d{2,2})(\d{4,4})'<br>
fileName = 'j:\\samis\\python\\dates.txt' # raw_input("file to use?
:")<br>
inp = open(fileName, "r") <br>
outp = open("DOBdata.out", "w")<br>
for line in inp.readlines():<br>
line = line.replace('\n','') # strip any \n<br>
parsedDate = re.findall(pat, line)<br>
listDate = list(parsedDate[0]) # extract tuple from list and
convert to list; returns ['12', '31', '2003']<br>
dateType = listDate[0]<br>
yyyymmddDate = [listDate[3],listDate[2],listDate[1],0,0,0,0,0,0] #
rearrange and extend<br>
intDate = map(int, yyyymmddDate)<br>
day = time.mktime(intDate) / 86400<br>
if dateType == 'date:':<br>
expDay = day<br>
else:<br>
monthDif = round((expDay - day)/365.25*12)<br>
outp.write('%s=%s\n'%(line, monthDif))<br><br>
</tt>9) also note that Python has a nice expression for testing if a
value is in a range. Instead of:<br><br>
<pre>dob[0]<16) and (dob[0]>-16
</pre>you can use:<br><br>
<pre>-16 < dob[0] < 16
</pre><x-sigsep><p></x-sigsep>
Bob Gailerbgailer@alum.rpi.edu<br>
303 442 2625<br>
</body>
</html>
--=====================_12467297==.ALT--
--=======56DC6656=======
Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-1B0B3AF9
Content-Disposition: inline
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.488 / Virus Database: 287 - Release Date: 6/5/2003
--=======56DC6656=======--