[Tutor] Hi, First question

Chris “Kwpolska” Warrick kwpolska at gmail.com
Sun Jun 16 17:55:59 CEST 2013


On Sat, Jun 15, 2013 at 7:22 AM, Patrick Williams <pdw0005 at gmail.com> wrote:
> Hi so I am making a bit of code to extract a bit of numbers data from a file
> and then find the average of that data, however while I can get the code to
> extract each specific piece of data I need, I can't seem to get the numbers
> to add separately  so I can get a proper average. My sum1 variable seems to
> only take the last bit of data entered. I was just wondering if anyone knows
> what I'm doing wrong, the course I'm following hadn't started using regex
> (or even proper lists) at this point, so there must be a way to do it
> without. here's the code. the average of the data should be 0.6789 or
> something, but I get 0.0334343 or something.
>
> count=0
> lst=list()

`lst = []` is the preferred syntax.

> fname='mbox-short.txt'
> fhand=open(fname)
> for line in fhand:
>     if line.startswith('X-DSPAM-Confidence:'):
>         count=count+1
>         colpos=line.find(':')
>         zpos=line.find('0',colpos)
>         num=float(line[zpos:50])
>         sum1=0+num
>         avg=float(sum1)/int(count)
> print 'Count-', count,'--', 'Average-', avg
>
> Any help at all is appreciated, and thanks in advance.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

I don’t know what file you used, but the message you sent got this
header from Gmail, and the format doesn’t seem to be much different:

> X-Spam-Evidence: '*H*': 0.79; '*S*': 0.00; 'separately': 0.09;
>        'wrong,': 0.09; 'subject:question': 0.10; 'code.': 0.18;
>        'variable': 0.18; 'bit': 0.19; 'advance.': 0.19; 'seems': 0.21;
>        '8bit%:5': 0.22; 'print': 0.22; 'skip:l 30': 0.24; '\xa0so': 0.24;
> [snip 11 more lines]
(replaced tabstops with spaces)

Can you guess what’s wrong in your code?

You are reading only the first line.  There are more.  How do you
handle that?  You need to make your algorithm read all the further
lines that begin with the indentation your thing uses (it might be the
tab character '\t' or some spaces), and stop when it encounters
another header.  This can be done either by checking if the line
begins with the indentation OR by checking match with regexp
'[A-Za-z]+: .+'

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail                | always bottom-post
http://asciiribbon.org        | http://caliburn.nl/topposting.html
                   >


More information about the Tutor mailing list