[Tutor] Newbie Here -- Averaging & Adding Madness Over a Given (x) Range?!?!
mmcconac at redhat.com
Thu Feb 14 21:55:23 CET 2013
This is my first post here. I have tried to get answers from StackOverflow, but I realized quickly that I am too "green" for that environment. As such, I have purchased Beginning Python (2nd edition, Hetland) and also the $29.00 course available from learnpythonthehardway(dot)com. I have been reading fervently, and have enjoyed python -- very much. I can do all the basic printing, math, substitutions, etc. Although, I am stuck when trying to combine all the new skills I have been learning over the past few weeks. Anyway, I was hoping to get some help with something NON-HOMEWORK related. (I swear.)
I have a task that I have generalized due to the nature of what I am trying to do -- and it's need to remain confidential.
My end goal as described on SO was: "Calculating and Plotting the Average of every (X) items in a list of (Y) total", but for now I am only stuck on the actual addition, and/or averaging items -- in a serial sense, based on the relation to the previous number, average of numbers, etc being acted on. Not the actual plotting. (Plotting is pretty EZ.)
1. I have a list of numbers that already exist in a file. I generate this file by parsing info from logs.
2. Each line contains an integer on it (corresponding to the number of milliseconds that it takes to complete a certain repeated task).
3. There are over a million entries in this file, one per line; at any given time it can be just a few thousand, or more than a million.
Eventually what I'll need to do is:
1. Index the file and/or count the lines, as to identify each line's positional relevance so that it can average any range of numbers that are sequential; one to one another.
2. Calculate the difference between any given (x) range. In order to be able to ask the program to average every 5, 10, 100, 100, or 10,000 etc. --> until completion. This includes the need to dealing with stray remainders at the end of the file that aren't divisible by that initial requested range.
(ie: average some file with 3,245 entries by 100 --> not excluding the remaining 45 entries, in order to represent the remainder.)
So, looking above, transaction #1 took "173" milliseconds, while transaction #2 took 1685 milliseconds.
Based on this, I need to figure out how to do two things:
1. Calculate the difference of each transaction, related to the one before it AND record/capture the difference. (An array, list, dictionary -- I don't care.)
2. Starting with the very first line/entry, count the first (x number) and average (x). I can obtain a "Happy medium" for what the gradient/delta is between sets of 100 over the course of the aggregate.
Entries 1-100 = (eventualPlottedAvgTotalA)
Entries 101-200 = (eventualPlottedAvgTotalB)
Entries 201-300 = (eventualPlottedAvgTotalC)
Entries 301-400 = (eventualPlottedAvgTotalD)
>From what I can tell, I don't need to indefinitely store the values, only pass them as they are processed (in order) to the plotter. I have tried the following example to sum a range of 5 entries from the above list of 5 (which works), but I don't know how to dynamically pass the 5 at a time until completion, all the while retaining the calculated averages which will ultimately be passed to pyplot at a later time/date.
What I have been able to figure out thus far is below.
Python 2.7.3 (default, Jul 24 2012, 10:05:38)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> plottedTotalA = ['173', '1685', '1152', '253', '1623']
>>> sum(float(t) for t in plottedTotalA)
I received 2 answers from SO, but was unable to fully capture what they were trying to tell me. Unfortunately, I might need a "baby-step" / "Barney-style" mentor who is willing to guide me on this. I hope this makes sense to someone out there, and thank you in advance for any help that you can provide. I apologize in advance for being so thick if its uber-EZ.
More information about the Tutor